Novel mad nucleases

ABSTRACT

The present disclosure provides new RNA-guided nucleases for making rational, direct edits to nucleic acids in live cells.

RELATED CASES

This application is a continuation of U.S. Ser. No. 17,084,522, filed 29Oct. 2020; which is a continuation of U.S. Ser. No. 16/837,212, filed 1Apr. 2020, now U.S. Pat. No. 10,883,095; which claims priority to U.S.Ser. No. 62/946,282, filed 10 Dec. 2019, entitled “Novel MAD Nucleases.”

FIELD OF THE INVENTION

This invention relates to new nucleic acid-guided nucleases for makingrational, directed edits to live cells.

BACKGROUND OF THE INVENTION

In the following discussion certain articles and methods will bedescribed for background and introductory purposes. Nothing containedherein is to be construed as an “admission” of prior art. Applicantexpressly reserves the right to demonstrate, where appropriate, that themethods referenced herein do not constitute prior art under theapplicable statutory provisions.

The ability to make precise, targeted changes to the genome of livingcells has been a long-standing goal in biomedical research anddevelopment. Recently, various nucleases have been identified that allowmanipulation of gene sequence; hence, gene function, These nucleasesinclude nucleic acid-guided nucleases. The range of target sequencesthat nucleic acid-guided nucleases can recognize, however, isconstrained by the need for a specific PAM to be located near thedesired target sequence. PAMs are short nucleotide sequences recognizedby a gRNA/nuclease complex where this complex directs editing of thetarget sequence. The precise PAM sequence and PAM length requirementsfor different nucleic acid-guided nucleases vary; however, PAMstypically are 2-7 base-pair sequences adjacent or in proximity to thetarget sequence and, depending on the nuclease, can be 5′ or 3′ to thetarget sequence. Engineering nucleic acid-guided nucleases or mining fornew nucleic acid-guided nucleases may provide nucleases with altered PAMpreferences and/or altered activity or fidelity; all changes that mayincrease the versatility of a nucleic acid-guided nuclease for certainediting tasks.

There is thus a need in the art of nucleic acid-guided nuclease geneediting for novel nucleases with varied PAM preferences, varied activityin cells from different organisms such as mammals and/or altered enzymefidelity. The novel MAD-series nucleases described herein satisfy thisneed.

SUMMARY OF THE INVENTION

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key or essentialfeatures of the claimed subject matter, nor is it intended to be used tolimit the scope of the claimed subject matter. Other features, details,utilities, and advantages of the claimed subject matter will be apparentfrom the following written Detailed Description including those aspectsillustrated in the accompanying drawings and defined in the appendedclaims.

The present disclosure provides mined MAD-series nucleases (e.g.,RNA-guided nucleases or RGNs) with varied PAM preferences, and/or variedactivity in mammalian cells.

Thus, in one embodiment there are provided MAD-series nuclease systemsthat perform nucleic acid-guided nuclease editing including a MAD2001system comprising SEQ ID Nos. 1 (MAD2001 nuclease), 20 (gRNA repeatsequence) and 21 (gRNA tracr sequence); a MAD2001 system comprising SEQID Nos. 1 (MAD2001 nuclease), 22 (gRNA repeat sequence) and 23 (gRNAtracr sequence); a MAD2001 system comprising SEQ ID Nos. 1 (MAD2001nuclease), 24 (gRNA repeat sequence) and 25 (gRNA tracr sequence); aMAD2007 system comprising SEQ ID Nos. 7 (MAD2007 nuclease), 26 (gRNArepeat sequence) and 27 (gRNA tracr sequence); a MAD2007 systemcomprising SEQ ID Nos. 7 (MAD2007 nuclease), 28 (gRNA repeat sequence)and 29 (gRNA tracr sequence); a MAD2007 system comprising SEQ ID Nos. 7(MAD2007 nuclease), 30 (gRNA repeat sequence) and 31 (gRNA tracrsequence); a MAD2008 system comprising SEQ ID Nos. 8 (MAD2008 nuclease),32 (gRNA repeat sequence) and 33 (gRNA tracr sequence); a MAD2008 systemcomprising SEQ ID Nos. 8 (MAD2008 nuclease), 34 (gRNA repeat sequence)and 35 (gRNA tracr sequence); a MAD2008 system comprising SEQ ID Nos. 8(MAD2008 nuclease), 36 (gRNA repeat sequence) and 37 (gRNA tracrsequence); a MAD2009 system comprising SEQ ID Nos. 9 (MAD2009 nuclease),38 (gRNA repeat sequence) and 39 (gRNA tracr sequence); a MAD2009 systemcomprising SEQ ID Nos. 9 (MAD2009 nuclease), 38 (gRNA repeat sequence)and 40 (gRNA tracr sequence); a MAD2009 system comprising SEQ ID Nos. 9(MAD2009 nuclease), 41 (gRNA repeat sequence) and 42 (gRNA tracrsequence); a MAD2011 system comprising SEQ ID Nos. 11 (MAD2011nuclease), 43 (gRNA repeat sequence) and 44 (gRNA tracr sequence); aMAD2011 system comprising SEQ ID Nos. 11 (MAD2011 nuclease), 45 (gRNArepeat sequence) and 46 (gRNA tracr sequence); and a MAD2011 systemcomprising SEQ ID Nos. 11 (MAD2011 nuclease), 47 (gRNA repeat sequence)and 48 (gRNA tracr sequence). In some aspects, the MAD-series systemcomponents are delivered as sequences to be transcribed (in the case ofthe gRNA components) and transcribed and translated (in the case of theMAD-series nuclease), and in some aspects, the coding sequence for theMAD-series nuclease and the gRNA component sequences are on the samevector. In other aspects, the coding sequence for the MAD-seriesnuclease and the gRNA component sequences are on a different vector andin some aspects, the gRNA component sequences are located in an editingcassette which also comprises a donor DNA (e.g., homology arm). In otheraspects, the MAD-series nuclease is delivered to the cells as a peptideor the MAD-series nuclease and gRNA components are delivered to thecells as a ribonuclease complex.

Additionally there are provided nickases comprising a MAD2001 nickase 1[SEQ ID No. 14]; a MAD2001 nickase 2 [SEQ ID No. 15]; a dead MAD2001[SEQ ID No. 16]; a MAD2007 nickase 1 [SEQ ID No. 17]; a MAD2007 nickase2 [SEQ ID No. 18]; a dead MAD2007 [SEQ ID No. 19]; a MAD2008 nickase 1[SEQ ID No. 51]; a MAD2008 nickase 2 [SEQ ID No. 52]; a dead MAD2008[SEQ ID No. 53]; a MAD2009 nickase 1 [SEQ ID No. 54]; a MAD2009 nickase2 [SEQ ID No. 55]; a dead MAD2009 [SEQ ID No. 56]; a MAD 2011 nickase 1[SEQ ID No. 57]; a MAD2011 nickase 2 [SEQ ID No. 58]; and a dead MAD2011[SEQ ID No. 2859].

In addition, there are provided spacer sequence and PAM sequence pairsfor MAD2007, including SEQ ID Nos. 60 and 61; SEQ ID Nos. 62 and 63; SEQID Nos. 64 and 65; SEQ ID Nos. 66 and 67; SEQ ID Nos. 68 and 69; SEQ IDNos. 70 and 71; SEQ ID Nos. 72 and 73; SEQ ID Nos. 74 and 75; SEQ IDNos. 76 and 77; SEQ ID Nos. 78 and 79; SEQ ID Nos. 80 and 81; SEQ IDNos. 82 and 83; SEQ ID Nos. 84 and 85; SEQ ID Nos. 86 and 87; SEQ IDNos. 88 and 89; SEQ ID Nos. 90 and 91; SEQ ID Nos. 92 and 93; SEQ IDNos. 94 and 95; SEQ ID Nos. 96 and 97; SEQ ID Nos. 98 and 99; SEQ IDNos. 100 and 101; SEQ ID Nos. 102 and 103; SEQ ID Nos. 104 and 105; SEQID Nos. 106 and 107; SEQ ID Nos. 108 and 109; SEQ ID Nos. 110 and 111;SEQ ID Nos. 112 and 113; SEQ ID Nos. 114 and 115; SEQ ID Nos. 116 and117; SEQ ID Nos. 118 and 119; SEQ ID Nos. 120 and 121; SEQ ID Nos. 122and 123; SEQ ID Nos. 124 and 125; and SEQ ID Nos. 126 and 127. Also,there are provided spacer sequence and PAM sequence pairs for MAD2001,including SEQ ID Nos. 128 and 129; SEQ ID Nos. 130 and 131; SEQ ID Nos.132 and 133; SEQ ID Nos. 134 and 135; SEQ ID Nos. 136 and 137; and SEQID Nos. 138 and 139.

In yet another embodiment, there is provided additional MAD2007sequences from Sharpea azabuensis comprising SEQ ID No. 142; SEQ ID No.143; and SEQ ID No. 144.

These aspects and other features and advantages of the invention aredescribed below in more detail.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is an exemplary workflow for creating and screening minedMAD-series nucleases or RGNs.

FIG. 2 is a plot of protein size vs. search score for the novelnucleases discovered.

FIG. 3 is a schematic of gRNA designs for new MAD nucleases.

FIG. 4 shows the enrichment of targets cleaved by the mined MAD-seriesnucleases in the form of a sequence logo.

FIG. 5 is a series of bar graphs showing the activity of MAD2001,MAD2007, MAD2008, MAD2009 and MAD2011 in HEK293T cells.

FIG. 6 shows human genome coverage of MAD2001, MAD2007, MAD2008, MAD2009and MAD2011.

FIG. 7 shows the percentage of GFP⁻ HEK293T cells relative to a negativecontrol.

FIG. 8 is a sequence logo for the PAM of MAD2007 in HEK293T cells.

FIG. 9 shows the percentage of GFP⁻ HEK293T cells relative to a negativecontrol.

FIG. 10 shows the percentage of loss of function in HEK293T cells fortwo different human codon optimized MAD2007 nucleases.

DETAILED DESCRIPTION

The description set forth below in connection with the appended drawingsis intended to be a description of various, illustrative embodiments ofthe disclosed subject matter. Specific features and functionalities aredescribed in connection with each illustrative embodiment; however, itwill be apparent to those skilled in the art that the disclosedembodiments may be practiced without each of those specific features andfunctionalities. Moreover, all of the functionalities described inconnection with one embodiment are intended to be applicable to theadditional embodiments described herein except where expressly stated orwhere the feature or function is incompatible with the additionalembodiments. For example, where a given feature or function is expresslydescribed in connection with one embodiment but not expressly mentionedin connection with an alternative embodiment, it should be understoodthat the feature or function may be deployed, utilized, or implementedin connection with the alternative embodiment unless the feature orfunction is incompatible with the alternative embodiment.

The practice of the techniques described herein may employ, unlessotherwise indicated, conventional techniques and descriptions of organicchemistry, polymer technology, molecular biology (including recombinanttechniques), cell biology, biochemistry, biological emulsion generation,and sequencing technology, which are within the skill of those whopractice in the art. Such conventional techniques include polymer arraysynthesis, hybridization and ligation of polynucleotides, and detectionof hybridization using a label. Specific illustrations of suitabletechniques can be had by reference to the examples herein. However,other equivalent conventional procedures can, of course, also be used.Such conventional techniques and descriptions can be found in standardlaboratory manuals such as Green, et al., Eds. (1999), Genome Analysis:A Laboratory Manual Series (Vols. I-IV); Weiner, Gabriel, Stephens, Eds.(2007), Genetic Variation: A Laboratory Manual; Dieffenbach, Dveksler,Eds. (2003), PCR Primer: A Laboratory Manual; Bowtell and Sambrook(2003), DNA Microarrays: A Molecular Cloning Manual; Mount (2004),Bioinformatics: Sequence and Genome Analysis; Sambrook and Russell(2006), Condensed Protocols from Molecular Cloning: A Laboratory Manual;and Sambrook and Russell (2002), Molecular Cloning: A Laboratory Manual(all from Cold Spring Harbor Laboratory Press); Stryer, L. (1995)Biochemistry (4th Ed.) W. H. Freeman, New York N.Y.; Gait,“Oligonucleotide Synthesis: A Practical Approach” 1984, IRL Press,London; Nelson and Cox (2000), Lehninger, Principles of Biochemistry3^(rd) Ed., W. H. Freeman Pub., New York, N.Y.; Berg et al. (2002)Biochemistry, 5^(th) Ed., W. H. Freeman Pub., New York, N.Y.; Cell andTissue Culture: Laboratory Procedures in Biotechnology (Doyle &Griffiths, eds., John Wiley & Sons 1998); Mammalian ChromosomeEngineering—Methods and Protocols (G. Hadlaczky, ed., Humana Press2011); Essential Stem Cell Methods, (Lanza and Klimanskaya, eds.,Academic Press 2011), all of which are herein incorporated in theirentirety by reference for all purposes. Nuclease-specific techniques canbe found in, e.g., Genome Editing and Engineering From TALENs andCRISPRs to Molecular Surgery, Appasani and Church, 2018; and CRISPR:Methods and Protocols, Lindgren and Charpentier, 2015; both of which areherein incorporated in their entirety by reference for all purposes.Basic methods for enzyme engineering may be found in, Enzyme EngineeringMethods and Protocols, Samuelson, ed., 2013; Protein Engineering,Kaumaya, ed., (2012); and Kaur and Sharma, “Directed Evolution: AnApproach to Engineer Enzymes”, Crit. Rev. Biotechnology, 26:165-69(2006).

Note that as used herein and in the appended claims, the singular forms“a,” “an,” and “the” include plural referents unless the context clearlydictates otherwise. Thus, for example, reference to “an oligonucleotide”refers to one or more oligonucleotides. Terms such as “first,” “second,”“third,” etc., merely identify one of a number of portions, components,steps, operations, functions, and/or points of reference as disclosedherein, and likewise do not necessarily limit embodiments of the presentdisclosure to any particular configuration or orientation.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. All publications mentionedherein are incorporated by reference for the purpose of describing anddisclosing devices, methods and cell populations that may be used inconnection with the presently described invention.

Where a range of values is provided, it is understood that eachintervening value, between the upper and lower limit of that range andany other stated or intervening value in that stated range isencompassed within the invention. The upper and lower limits of thesesmaller ranges may independently be included in the smaller ranges, andare also encompassed within the invention, subject to any specificallyexcluded limit in the stated range. Where the stated range includes oneor both of the limits, ranges excluding either both of those includedlimits are also included in the invention.

In the following description, numerous specific details are set forth toprovide a more thorough understanding of the present invention. However,it will be apparent to one of ordinary skill in the art that the presentinvention may be practiced without one or more of these specificdetails. In other instances, well-known features and procedures wellknown to those skilled in the art have not been described in order toavoid obscuring the invention.

The term “complementary” as used herein refers to Watson-Crick basepairing between nucleotides and specifically refers to nucleotideshydrogen bonded to one another with thymine or uracil residues linked toadenine residues by two hydrogen bonds and cytosine and guanine residueslinked by three hydrogen bonds. In general, a nucleic acid includes anucleotide sequence described as having a “percent complementarity” or“percent homology” to a specified second nucleotide sequence. Forexample, a nucleotide sequence may have 80%, 90%, or 100%complementarity to a specified second nucleotide sequence, indicatingthat 8 of 10, 9 of 10 or 10 of 10 nucleotides of a sequence arecomplementary to the specified second nucleotide sequence. For instance,the nucleotide sequence 3′-TCGA-5′ is 100% complementary to thenucleotide sequence 5′-AGCT-3′; and the nucleotide sequence 3′-TCGA-5′is 100% complementary to a region of the nucleotide sequence5′-TAGCTG-3′.

The term DNA “control sequences” refers collectively to promotersequences, polyadenylation signals, transcription termination sequences,upstream regulatory domains, origins of replication, internal ribosomeentry sites, nuclear localization sequences, enhancers, and the like,which collectively provide for the replication, transcription andtranslation of a coding sequence in a recipient cell. Not all of thesetypes of control sequences need to be present so long as a selectedcoding sequence is capable of being replicated, transcribed and—for somecomponents—translated in an appropriate host cell.

As used herein the term “donor DNA” or “donor nucleic acid” refers tonucleic acid that is designed to introduce a DNA sequence modification(insertion, deletion, substitution) into a locus by homologousrecombination using nucleic acid-guided nucleases. For homology-directedrepair, the donor DNA must have sufficient homology to the regionsflanking the “cut site” or site to be edited in the genomic targetsequence. The length of the homology arm(s) will depend on, e.g., thetype and size of the modification being made. In many instances andpreferably, the donor DNA will have two regions of sequence homology(e.g., two homology arms) to the genomic target locus. Preferably, an“insert” region or “DNA sequence modification” region-the nucleic acidmodification that one desires to be introduced into a genome targetlocus in a cell-will be located between two regions of homology. The DNAsequence modification may change one or more bases of the target genomicDNA sequence at one specific site or multiple specific sites. A changemay include changing 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 50, 75,100, 150, 200, 300, 400, or 500 or more base pairs of the targetsequence. A deletion or insertion may be a deletion or insertion of 1,2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 300, 400, or500 or more base pairs of the target sequence.

The terms “guide nucleic acid” or “guide RNA” or “gRNA” refer to apolynucleotide comprising 1) a guide sequence capable of hybridizing toa genomic target locus, and 2) a scaffold sequence capable ofinteracting or complexing with a nucleic acid-guided nuclease (see,e.g., FIG. 1).

“Homology” or “identity” or “similarity” refers to sequence similaritybetween two peptides or, more often in the context of the presentdisclosure, between two nucleic acid molecules. The term “homologousregion” or “homology arm” refers to a region on the donor DNA with acertain degree of homology with the target genomic DNA sequence.Homology can be determined by comparing a position in each sequencewhich may be aligned for purposes of comparison. When a position in thecompared sequence is occupied by the same base or amino acid, then themolecules are homologous at that position. A degree of homology betweensequences is a function of the number of matching or homologouspositions shared by the sequences.

“Operably linked” refers to an arrangement of elements where thecomponents so described are configured so as to perform their usualfunction. Thus, control sequences operably linked to a coding sequenceare capable of effecting the transcription, and in some cases, thetranslation, of a coding sequence. The control sequences need not becontiguous with the coding sequence so long as they function to directthe expression of the coding sequence. Thus, for example, interveninguntranslated yet transcribed sequences can be present between a promotersequence and the coding sequence and the promoter sequence can still beconsidered “operably linked” to the coding sequence. In fact, suchsequences need not reside on the same contiguous DNA molecule (i.e.chromosome) and may still have interactions resulting in alteredregulation.

A “promoter” or “promoter sequence” is a DNA regulatory region capableof binding RNA polymerase and initiating transcription of apolynucleotide or polypeptide coding sequence such as messenger RNA,ribosomal RNA, small nuclear or nucleolar RNA, guide RNA, or any kind ofRNA transcribed by any class of any RNA polymerase I, II or III.Promoters may be constitutive or inducible and, in someembodiments—particularly many embodiments in which selection isemployed—the transcription of at least one component of the nucleicacid-guided nuclease editing system is under the control of an induciblepromoter.

As used herein the term “selectable marker” refers to a gene introducedinto a cell, which confers a trait suitable for artificial selection.General use selectable markers are well-known to those of ordinary skillin the art. Drug selectable markers such as ampicillin/carbenicillin,kanamycin, chloramphenicol, erythromycin, tetracycline, gentamicin,bleomycin, streptomycin, rhamnose, puromycin, hygromycin, blasticidin,and G418 may be employed. In other embodiments, selectable markersinclude, but are not limited to human nerve growth factor receptor(detected with a MAb, such as described in U.S. Pat. No. 6,365,373);truncated human growth factor receptor (detected with MAb); mutant humandihydrofolate reductase (DHFR; fluorescent MTX substrate available);secreted alkaline phosphatase (SEAP; fluorescent substrate available);human thymidylate synthase (TS; confers resistance to anti-cancer agentfluorodeoxyuridine); human glutathione S-transferase alpha (GSTA1;conjugates glutathione to the stem cell selective alkylator busulfan;chemoprotective selectable marker in CD34+cells); CD24 cell surfaceantigen in hematopoietic stem cells; human CAD gene to confer resistanceto N-phosphonacetyl-L-aspartate (PALA); human multi-drug resistance-1(MDR-1; P-glycoprotein surface protein selectable by increased drugresistance or enriched by FACS); human CD25 (IL-2a; detectable byMab-FITC); Methylguanine-DNA methyltransferase (MGMT; selectable bycarmustine); and Cytidine deaminase (CD; selectable by Ara-C).“Selective medium” as used herein refers to cell growth medium to whichhas been added a chemical compound or biological moiety that selects foror against selectable markers.

The terms “target genomic DNA sequence”, “target sequence”, or “genomictarget locus” refer to any locus in vitro or in vivo, or in a nucleicacid (e.g., genome) of a cell or population of cells, in which a changeof at least one nucleotide is desired using a nucleic acid-guidednuclease editing system. The target sequence can be a genomic locus orextrachromosomal locus.

A “vector” is any of a variety of nucleic acids that comprise a desiredsequence or sequences to be delivered to and/or expressed in a cell.Vectors are typically composed of DNA, although RNA vectors are alsoavailable. Vectors include, but are not limited to, plasmids, fosmids,phagemids, virus genomes, synthetic chromosomes, and the like. As usedherein, the phrase “engine vector” comprises a coding sequence for anuclease to be used in the nucleic acid-guided nuclease systems andmethods of the present disclosure. The engine vector may also comprise,in a bacterial system, the λ Red recombineering system or an equivalentthereto. Engine vectors also typically comprise a selectable marker. Asused herein the phrase “editing vector” comprises a donor nucleic acid,optionally including an alteration to the target sequence that preventsnuclease binding at a PAM or spacer in the target sequence after editinghas taken place, and a coding sequence for a gRNA. The editing vectormay also comprise a selectable marker and/or a barcode. In someembodiments, the engine vector and editing vector may be combined; thatis, the contents of the engine vector may be found on the editingvector. Further, the engine and editing vectors comprise controlsequences operably linked to, e.g., the nuclease coding sequence,recombineering system coding sequences (if present), donor nucleic acid,guide nucleic acid, and selectable marker(s).

Editing in Nucleic Acid-Guided Nuclease Genome Systems

RNA-guided nucleases (RGNs) have rapidly become the foundational toolsfor genome engineering of prokaryotes and eukaryotes. Clustered RapidlyInterspaced Short Palindromic Repeats (CRISPR) systems are an adaptiveimmunity system which protect prokaryotes against mobile geneticelements (MGEs). RGNs are a major part of this defense system becausethey identify and destroy MGEs. RGNs can be repurposed for genomeediting in various organisms by reprogramming the CRISPR RNA (crRNA)that guides the RGN to a specific target DNA. A number of different RGNshave been identified to date for various applications; however, thereare various properties that make some RGNs more desirable than othersfor specific applications. RGNs can be used for creating specific doublestrand breaks (DSBs), specific nicks of one strand of DNA, or guideanother moiety to a specific DNA sequence.

The ability of an RGN to specifically target any genomic sequence isperhaps the most desirable feature of RGNs; however, RGNs can onlyaccess their desired target if the target DNA also contains a shortmotif called PAM (protospacer adjacent motif) that is specific for everyRGN. Type V RGNs such as MAD7, AsCas12a and LbCas12a tend to access DNAtargets that contain YTTN/TTTN on the 5′ end whereas type II RGNs targetDNA sequences containing a specific short motif on the 3′ end. Anexample well known in the art for a type II RGN is SpCas9 which requiresan NGG on the 3′ end of the target DNA. Type II RGNs have substantiallydifferent domain architecture relative to type V RGNs. Further, type IIRGNs also require a transactivating RNA (tracrRNA) in addition to acrRNA for optimal function. Compared to type V RGNs, the type II RGNscreate a double-strand break closer to the PAM sequence, which is highlydesirable for precise genome editing applications.

A number of type II RGNs have been discovered so far; however, their usein widespread applications is limited by restrictive PAMs. For example,the PAM of SpCas9 occurs less frequently in AT-rich regions of thegenome. New RGNs with new and less restrictive PAMs are beneficial forthe field. Further, not all type II nucleases are active in multipleorganisms. For example, a number of RGNs have been discussed in thescientific literature but only a few have been demonstrated to be activein vitro and fewer still are active in cells, particularly in mammaliancells. The present disclosure identifies multiple RGNs that have novelPAMs and are active in mammalian cells.

In performing nucleic acid-guided nuclease editing, the mined MAD-seriesnucleases or RGNs may be delivered to cells to be edited as apolypeptide; alternatively, a polynucleotide sequence encoding the minedMAD-series nucleases are transformed or transfected into the cells to beedited. The polynucleotide sequence encoding the mined MAD-seriesnuclease may be codon optimized for expression in particular cells, suchas archaeal, prokaryotic or eukaryotic cells. Eukaryotic cells can beyeast, fungi, algae, plant, animal, or human cells. Eukaryotic cells maybe those of or derived from a particular organism, such as a mammal,including but not limited to human, mouse, rat, rabbit, dog, ornon-human mammals including non-human primates. The choice of the minedMAD-series nuclease to be employed depends on many factors, such as whattype of edit is to be made in the target sequence and whether anappropriate PAM is located close to the desired target sequence. Themined MAD-series nuclease may be encoded by a DNA sequence on a vector(e.g., the engine vector) and be under the control of a constitutive orinducible promoter. In some embodiments, the sequence encoding thenuclease is under the control of an inducible promoter, and theinducible promoter may be separate from but the same as an induciblepromoter controlling transcription of the guide nucleic acid; that is, aseparate inducible promoter may drive the transcription of the nucleaseand guide nucleic acid sequences but the two inducible promoters may bethe same type of inducible promoter (e.g., both are pL promoters).Alternatively, the inducible promoter controlling expression of thenuclease may be different from the inducible promoter controllingtranscription of the guide nucleic acid; that is, e.g., the nuclease maybe under the control of the pBAD inducible promoter, and the guidenucleic acid may be under the control of the pL inducible promoter.

In general, a guide nucleic acid (e.g., gRNA) complexes with acompatible nucleic acid-guided nuclease and can then hybridize with atarget sequence, thereby directing the nuclease to the target sequence.With the MAD-series enzymes described herein, the nucleic acid-guidednuclease editing system uses two separate guide nucleic acid componentsthat combine and function as a guide nucleic acid; that is, a CRISPR RNA(crRNA) and a transactivating CRISPR RNA (tracrRNA). The gRNA may beencoded by a DNA sequence on a polynucleotide molecule such as aplasmid, linear construct, or the coding sequence may reside within anediting cassette and is under the control of a constitutive promoter,or, in some embodiments, an inducible promoter as described below.

A guide nucleic acid comprises a guide polynucleotide sequence havingsufficient complementarity with a target sequence to hybridize with thetarget sequence and direct sequence-specific binding of a complexednucleic acid-guided nuclease to the target sequence. The degree ofcomplementarity between a guide sequence and the corresponding targetsequence, when optimally aligned using a suitable alignment algorithm,is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%,99%, or more. Optimal alignment may be determined with the use of anysuitable algorithm for aligning sequences. In some embodiments, a guidesequence is about or more than about 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, ormore nucleotides in length. In some embodiments, a guide sequence isless than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length.Preferably the guide sequence is 10-30 or 15-20 nucleotides long, or 15,16, 17, 18, 19, or 20 nucleotides in length.

In the present methods and compositions, the components of the guidenucleic acid is provided as a sequence to be expressed from a plasmid orvector and comprises both the guide sequence and the scaffold sequenceas a single transcript under the control of a promoter, and in someembodiments, an inducible promoter. In general, to generate an edit in atarget sequence, the gRNA/nuclease complex binds to a target sequence asdetermined by the guide RNA, and the nuclease recognizes a protospaceradjacent motif (PAM) sequence adjacent to the target sequence. Thetarget sequence can be any polynucleotide endogenous or exogenous to aprokaryotic or eukaryotic cell, or in vitro. For example, the targetsequence can be a polynucleotide residing in the nucleus of a eukaryoticcell. A target sequence can be a sequence encoding a gene product (e.g.,a protein) or a non-coding sequence (e.g., a regulatory polynucleotide,an intron, a PAM, or “junk” DNA).

The guide nucleic acid may be part of an editing cassette that encodesthe donor nucleic acid. Alternatively, the guide nucleic acid may not bepart of the editing cassette and instead may be encoded on the engine orediting vector backbone. For example, a sequence coding for a guidenucleic acid can be assembled or inserted into a vector backbone first,followed by insertion of the donor nucleic acid in, e.g., the editingcassette. In other cases, the donor nucleic acid in, e.g., an editingcassette can be inserted or assembled into a vector backbone first,followed by insertion of the sequence coding for the guide nucleic acid.In yet other cases, the sequence encoding the guide nucleic acid and thedonor nucleic acid (inserted, for example, in an editing cassette) aresimultaneously but separately inserted or assembled into a vector. Inyet other embodiments, the sequence encoding the guide nucleic acid andthe sequence encoding the donor nucleic acid are both included in theediting cassette.

The target sequence is associated with a PAM, which is a shortnucleotide sequence recognized by the gRNA/nuclease complex. The precisePAM sequence and length requirements for different nucleic acid-guidednucleases vary; however, PAMs typically are 2-7 base-pair sequencesadjacent or in proximity to the target sequence and, depending on thenuclease, can be 5′ or 3′ to the target sequence. Engineering of thePAM-interacting domain of a nucleic acid-guided nuclease may allow foralteration of PAM specificity, improve fidelity, or decrease fidelity.In certain embodiments, the genome editing of a target sequence bothintroduces a desired DNA change to a target sequence, e.g., the genomicDNA of a cell, and removes, mutates, or renders inactive a proto-spacermutation (PAM) region in the target sequence. Rendering the PAM at thetarget sequence inactive precludes additional editing of the cell genomeat that target sequence, e.g., upon subsequent exposure to a nucleicacid-guided nuclease complexed with a synthetic guide nucleic acid inlater rounds of editing. Thus, cells having the desired target sequenceedit and an altered PAM can be selected using a nucleic acid-guidednuclease complexed with a synthetic guide nucleic acid complementary tothe target sequence. Cells that did not undergo the first editing eventwill be cut rendering a double-stranded DNA break, and thus will notcontinue to be viable. The cells containing the desired target sequenceedit and PAM alteration will not be cut, as these edited cells no longercontain the necessary PAM site and will continue to grow and propagate.

As mentioned previously, the range of target sequences that nucleicacid-guided nucleases can recognize is constrained by the need for aspecific PAM to be located near the desired target sequence. As aresult, it often can be difficult to target edits with the precisionthat is necessary for genome editing. It has been found that nucleasescan recognize some PAMs very well (e.g., canonical PAMs), and other PAMsless well or poorly (e.g., non-canonical PAMs). Because the minedMAD-series nucleases disclosed herein may recognize different PAMs, themined MAD-series nucleases increase the number of target sequences thatcan be targeted for editing; that is, mined MAD-series nucleasesdecrease the regions of “PAM deserts” in the genome. Thus, the minedMAD-series nucleases expand the scope of target sequences that may beedited by increasing the number (variety) of PAM sequences recognized.Moreover, cocktails of mined MAD-series nucleases may be delivered tocells such that target sequences adjacent to several different PAMs maybe edited in a single editing run.

Another component of the nucleic acid-guided nuclease system is thedonor nucleic acid. In some embodiments, the donor nucleic acid is onthe same polynucleotide (e.g., editing vector or editing cassette) asthe guide nucleic acid and may be (but not necessarily) under thecontrol of the same promoter as the guide nucleic acid (e.g., a singlepromoter driving the transcription of both the guide nucleic acid andthe donor nucleic acid). For cassettes of this type, see U.S. Pat. Nos.10,240,167; 10,266,849; 9,982,278; 10,351,877; 10,364,442; 10,435,715;and 10,465,207. The donor nucleic acid is designed to serve as atemplate for homologous recombination with a target sequence nicked orcleaved by the nucleic acid-guided nuclease as a part of thegRNA/nuclease complex. A donor nucleic acid polynucleotide may be of anysuitable length, such as about or more than about 20, 25, 50, 75, 100,150, 200, 500, or 1000 nucleotides in length. In certain preferredaspects, the donor nucleic acid can be provided as an oligonucleotide ofbetween 20-300 nucleotides, more preferably between 50-250 nucleotides.The donor nucleic acid comprises a region that is complementary to aportion of the target sequence (e.g., a homology arm). When optimallyaligned, the donor nucleic acid overlaps with (is complementary to) thetarget sequence by, e.g., about 20, 25, 30, 35, 40, 50, 60, 70, 80, 90or more nucleotides. In many embodiments, the donor nucleic acidcomprises two homology arms (regions complementary to the targetsequence) flanking the mutation or difference between the donor nucleicacid and the target template. The donor nucleic acid comprises at leastone mutation or alteration compared to the target sequence, such as aninsertion, deletion, modification, or any combination thereof comparedto the target sequence.

Often the donor nucleic acid is provided as an editing cassette, whichis inserted into a vector backbone where the vector backbone maycomprise a promoter driving transcription of the gRNA and the codingsequence of the gRNA, or the vector backbone may comprise a promoterdriving the transcription of the gRNA but not the gRNA itself. Moreover,there may be more than one, e.g., two, three, four, or more guidenucleic acid/donor nucleic acid cassettes inserted into an enginevector, where each guide nucleic acid is under the control of separatedifferent promoters, separate like promoters, or where all guide nucleicacid/donor nucleic acid pairs are under the control of a singlepromoter. In some embodiments the promoter driving transcription of thegRNA and the donor nucleic acid (or driving more than one gRNA/donornucleic acid pair) is an inducible promoter. Inducible editing isadvantageous in that isolated cells can be grown for several to manycell doublings to establish colonies before editing is initiated, whichincreases the likelihood that cells with edits will survive, as thedouble-strand cuts caused by active editing are largely toxic to thecells. This toxicity results both in cell death in the edited colonies,as well as a lag in growth for the edited cells that do survive but mustrepair and recover following editing. However, once the edited cellshave a chance to recover, the size of the colonies of the edited cellswill eventually catch up to the size of the colonies of unedited cells.See, e.g., U.S. Pat. Nos. 10,533,152; 10,550,363; 10,532,324; and U.S.Ser. No. 16/597,826, filed 9 Oct. 2019; Ser. No. 16/597,831, filed 9Oct. 2019; Ser. No. 16/693,630, filed 25 Nov. 2019; 16/687,640, filed 18Nov. 2019; and 16/686,066, filed 15 Nov. 2019. Further, a guide nucleicacid may be efficacious directing the edit of more than one donornucleic acid in an editing cassette; e.g., if the desired edits areclose to one another in a target sequence.

In addition to the donor nucleic acid, an editing cassette may compriseone or more primer sites. The primer sites can be used to amplify theediting cassette by using oligonucleotide primers; for example, if theprimer sites flank one or more of the other components of the editingcassette.

In addition, the editing cassette may comprise a barcode. A barcode is aunique DNA sequence that corresponds to the donor DNA sequence such thatthe barcode can identify the edit made to the corresponding targetsequence. The barcode typically comprises four or more nucleotides. Insome embodiments, the editing cassettes comprise a collection of donornucleic acids representing, e.g., gene-wide or genome-wide libraries ofdonor nucleic acids. The library of editing cassettes is cloned intovector backbones where, e.g., each different donor nucleic acid isassociated with a different barcode.

Additionally, in some embodiments, an expression vector or cassetteencoding components of the nucleic acid-guided nuclease system furtherencodes one or more nuclear localization sequences (NLSs), such as aboutor more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In someembodiments, the nuclease comprises NLSs at or near the amino-terminusof the mined MAD-series RGN, NLSs at or near the carboxy-terminus of themined MAD-series RGN, or a combination.

The engine and editing vectors comprise control sequences operablylinked to the component sequences to be transcribed. As stated above,the promoters driving transcription of one or more components of themined MAD-series nuclease editing system may be inducible, and aninducible system is likely employed if selection is to be performed. Anumber of gene regulation control systems have been developed for thecontrolled expression of genes in plant, microbe, and animal cells,including mammalian cells, including the pL promoter (induced by heatinactivation of the CI857 repressor), the pBAD promoter (induced by theaddition of arabinose to the cell growth medium), and the rhamnoseinducible promoter (induced by the addition of rhamnose to the cellgrowth medium). Other systems include the tetracycline-controlledtranscriptional activation system (Tet-On/Tet-Off, Clontech, Inc. (PaloAlto, Calif.); Bujard and Gossen, PNAS, 89(12):5547-5551 (1992)), theLac Switch Inducible system (Wyborski et al., Environ Mol Mutagen,28(4):447-58 (1996); DuCoeur et al., Strategies 5(3):70-72 (1992); U.S.Pat. No. 4,833,080), the ecdysone-inducible gene expression system (Noet al., PNAS, 93(8):3346-3351 (1996)), the cumate gene-switch system(Mullick et al., BMC Biotechnology, 6:43 (2006)), and thetamoxifen-inducible gene expression (Zhang et al., Nucleic AcidsResearch, 24:543-548 (1996)) as well as others.

Typically, performing genome editing in live cells entails transformingcells with the components necessary to perform nucleic acid-guidednuclease editing. For example, the cells may be transformedsimultaneously with separate engine and editing vectors; the cells mayalready be expressing the mined MAD-series nuclease (e.g., the cells mayhave already been transformed with an engine vector or the codingsequence for the mined MAD-series nuclease may be stably integrated intothe cellular genome) such that only the editing vector needs to betransformed into the cells; or the cells may be transformed with asingle vector comprising all components required to perform nucleicacid-guided nuclease genome editing.

A variety of delivery systems can be used to introduce (e.g., transformor transfect) nucleic acid-guided nuclease editing system componentsinto a host cell. These delivery systems include the use of yeastsystems, lipofection systems, microinjection systems, biolistic systems,virosomes, liposomes, immunoliposomes, polycations, lipid:nucleic acidconjugates, virions, artificial virions, viral vectors, electroporation,cell permeable peptides, nanoparticles, nanowires, exosomes.Alternatively, molecular trojan horse liposomes may be used to delivernucleic acid-guided nuclease components across the blood brain barrier.Of particular interest is the use of electroporation, particularlyflow-through electroporation (either as a stand-alone instrument or as amodule in an automated multi-module system) as described in, e.g., U.S.Pat. Nos. 10,435,713; 10,443,074; 10,323,258; and 10,415,058.

After the cells are transformed with the components necessary to performnucleic acid-guided nuclease editing, the cells are cultured underconditions that promote editing. For example, if constitutive promotersare used to drive transcription of the mined MAD-series nucleases and/orgRNA, the transformed cells need only be cultured in a typical culturemedium under typical conditions (e.g., temperature, CO₂ atmosphere,etc.) Alternatively, if editing is inducible—by, e.g., activatinginducible promoters that control transcription of one or more of thecomponents needed for nucleic acid-guided nuclease editing, such as,e.g., transcription of the gRNA, donor DNA, nuclease, or, in the case ofbacteria, a recombineering system—the cells are subjected to inducingconditions.

EXAMPLES

The following examples are put forth so as to provide those of ordinaryskill in the art with a complete disclosure and description of how tomake and use the present invention, and are not intended to limit thescope of what the inventors regard as their invention, nor are theyintended to represent or imply that the experiments below are all of orthe only experiments performed. It will be appreciated by personsskilled in the art that numerous variations and/or modifications may bemade to the invention as shown in the specific aspects without departingfrom the spirit or scope of the invention as broadly described. Thepresent aspects are, therefore, to be considered in all respects asillustrative and not restrictive.

Example 1 Exemplary Workflow Overview

FIG. 1 shows an exemplary workflow 100 for creating and for in vitroscreening of mined MAD-series enzymes. In a first step 101, a vector wasprepared and cloned to make a template vector into which the codingsequences for the mined MAD-series RGNs are inserted. In another step103, metagenome mining was performed to identify putative RGNs ofinterest based on, e.g., sequence, potential PAM and likelihood ofactivity. Once putative RGNs of interest were identified in silico,cassettes were constructed 105 and cloned into the vector backbone andthen transformed into cells, thereby generating a library of minedMAD-series RGNs. The cells transformed with the mined MAD-series RGNswere arrayed in 96-well plates 107 for storage.

At step 109, an aliquot of the cells from each well was taken, and themined MAD-series RGNs were amplified from each aliquot. In parallel,gRNA libraries were amplified 110 for each mined MAD-series RGN. At step111, amplified PCR fragment expressing the gRNA libraries were combinedwith the amplified mined MAD-series RGNs to perform in vitrotranscription and translation to make active ribonuclease proteincomplexes 113. A synthetic target library was constructed 115 in whichto test target depletion 117 for each of the mined MAD-series RGNs.After target depletion, amplicons were produced for analysis usingnext-gen sequencing 119 and sequencing data analysis was performed 121to determine target depletion.

Example 2 Metagenome Mining

Metagenome-assembled genomes (MAG's) from various sources includingGenbank Bioproject accession numbers PRJNA348753, PRJNA385857,PRJNA432584 and PRJNA434545 were used to search for novel, putativeCRISPR nucleases using HMMER hidden Markov model searches. Hundreds ofpotential nucleases were identified. FIG. 2 shows the novel RGNs found,plotting protein size vs. HMMER search score. For each MAG with anuclease present, putative CRISPR arrays were identified and spacersequences extracted. These spacers were then used as queries to searchthe JGI IMG/VR viral metagenome database (Paez-Espino et al, NucleicAcids Res. 2017 Jan. 4; 45(D1): D457-D465) and predict putative PAMsequences based on viral sequences adjacent to spacer hits. Based on thesequence, potential PAM and confidence that the nuclease is likelyactive, 13 nucleases were identified (Table 1) for in vitro validation.The sequence of each of the 13 nucleases is shown in Table 2.

TABLE 1 Mined MAD- Protein series Predicted size Measured Name SourcePAM (aas) Active PAM MAD2001 Methaomassiliicoccaceae archaeon UBA75 1322yes NNRC MAD2002 Alphaproteobacteria bacterium UBA756 997 MAD2003Butyicimonas virosa CGGTT 1392 MAD2004 Mariprofundus sp. UVA 1536 1133MAD2005 Micavibrio sp. UBA2341 1049 MAD2006 Bacteroidales bacteriumUBA3382 ATTTNN 1501 MAD2007 Kandleria sp. UBA3674 NNCCART 1211 yes NNNSRMAD2008 Flavobacteriaceae bacterium UBA3591 1490 yes NNAA MAD2009Fibrobacter sp. UBA4297 CNAAAG 1511 yes NNAA MAD2010 Sulfurospirillumsp. UBA5727 1048 MAD2011 Enterococcus faecalis 1337 yes NGG MAD2012Thiobacillus denitrificans 1087 MAD2013 Flavobacterium sp. UBA6135NANNGT 1123

TABLE 2 Mined MAD- SEQ series ID Name No. Sequence MAD2001 1MKNTNEDYYLGLDIGTDSVGWAVTDKEYNILEFRRKPMWGIHLFEGGSTAQKTRVYRTSRRRLKRRAERIALLRDIFSEEIGKVDPGFFERLDESDLHLEDRVTSQKNSLFDDPEFNDKDLHKRFPTIYHLRRHLMHSNRKEDIRLIYLAAHHIIKFRGHFLYKGIGDEEIPSFEIVLNSLIDNLRDEYGMELEVSDRDLVKALLSDFSIGIREKSRELSSCLNAESENEKALVDFISGKKTNMKKLFDDEALDKMSFSLRDSGFEDQLRENEGVLGPERVHTLELSRQIFEWARLSSILKDSDSISEAKIKDYDQHREDLRMLKRAVKKYAPDKYSEVFKSKEHTGNYCSYVYVCGKGLPDKKCSTEEFQKYLKKILDDSGVRDDEEFKTLIQRLDAGILCPKQRTGENSVIPYSVHRKELIGILNNAAEHYPSLSRKGEDGFSSIDKILMLEEFRIPYYVGPLDDRSSRSWLIRNSFEAITPWNFNEIVDEDETSERFIGNLTSMCTYLGGEKVLPKNSLLYSRFMLYNEINNLRVGGEKIPAALKNKMVSELFANRATSSKVTLKELKAFLKGEGVLTDADEISGIDDGVKSTLRSEILIRKIIGDKISDREMAEEIVRILTVFGDERRRSKAKLKKEFSDKLTEKEIEKLSSLKFDGWGRLSEKFLTGLRQEVNGRSMSIIEILEDTNYNLQETLSKYSFNEIIDSYNEVLTSGPRSISYDILKDSYLSPAVKRGVWRALSVVKDILKAVGRPPKKIFVETTREEREKKRTESRKDALMYLYKSCKETEWEKRLDSVEESSLRNRSLYLYYTQLGKCMYCGKNIDIGELNTDLADRDHIYPQSKTKDDSIRNNLVLVCRGCNQAKGDRYPLPQEWVSRMHAFWTMLKDKGYISSEKYRRLTRRGELTEEEFGAFINRQLVETSQSAKAVITVLKNAFKDSDIVYVKGSNVSDFRSSYNFIKCRSVNDYHHAKDAYLNIVVGNVLDTKFTKNPSYVLKNREQYNIGRMYDRNVSRFGVDAWVAGDRGSIATVRKYMRRNNILFTRYATKSKGALFKETVHRKKEGLFERKKGLETEKYGGYSDISTSYLTLLEYDKGKKRIRSLEIVPTYFANTRPKEEDVIRFFSETRGLANVRVVMPEVRMKSLFEYRGFRFHVTGSNGKGRFWISSAIQLLLPENLYAYCKSIENNEKDSQRRSEKPLQNYGFSSEMNIELFKCLMDKAAKPPYDVKLSTLSKNLEEGFEKFKALELGPQVKVLQQILDIYSCDRKSGDLSVLGSARNAGRLDMNGVLSEADGEQVTMICQS PSGLFEKRVPMNEK MAD20022 MKKRIFGFDIGIASLGWAVVDFDDTADPENDIYPTGEIVKSGVRCFPVAENPKDGSSLAQPRRQKRLLRRLCRRKARRMAGIKNLFVANGLIGKDALFNEKSNIYKARDNADVWDLRVKALTDKLTTIEFIRVLTHLAKHRGFKSYRIAAEKADAESGKVLEAVKANRALLENGKTLAQIIVEKGGQKRNREKMIVKNGKTEKQASYENSIPRDEIERETRLIFEKQRAFGLEAANEKLQRDFEKIAFRFREIKSKNIEKMIGKCEFEKDEPRAPKNAPSAEFFVAWTKINNCRVREPDGKIRFLTQEEKENVFNLLKDQKEVKYSALKKALFAKRPDVQFTDIEYNPKPVYDKKTGEIIEKTENPENQKFFSLKGWHDLKSVIDVSSYPVETLDKIATVIATKKNDTDIAKGLKELNLPDAEIEKLTSLSFSKFIRLSLKALYKILPEMQKGMKYNEACDAVGYDFKSTGESFAAQKGKFLPPIPEALATTVPVVNRAMTQFRKVYNALAREYGTPDQINIELARDVYNTHDERKKIADKQKEYGEERKKARDLAQEKMEIENISGRDLLKFRLYEQQDGKCIYSGETLDLRRLTEQDYCDVDHIIPYSRSLDNSQNNKVLCLSRENRRKSDKTPLEYIIDPVKQAEFIARVKSMKGLSAPKRDRLLIRDFKEKELEFRDRNINDTRYMARYIMKYLDDCIDFSGSQTDIKDHVQSRIGSLTDFLRHQWGLHKDRNENDRHHAQDAIVIACATNGYTQYLAHLSKIPENKQAYANKYGQPWYKAFKQHVKQPWDGFYQDVQASLAEIFVSRPPRKNATGEVHQDTIRTLNPNKPQYSEKDVKSGIKLRGGLANNGDMLRVDVFSKKNAKGKEQFYLVPIYLADRIKPELPNKAIVANKSESEWIIMDATYSFKFSLYMDDLVSVIKGDKKIFGYYKGTSRSTASITIEGHDRNFIQPSIGVKTVDNIKKYQIDPLGRYVEVKSEIRLPLNIKKRKS MAD2003 3MKKVLGLDLGSSSIGWAYVHEAENEAELGSSKIIKLGVRVNPLTVDEQRNFEQGKSITTNASRTLKRCMRRNLQRYKLRRENLIEVLKKHGFISDASILSEQGNYTTFETYRLRAKAAVAEISLEELARVLLMINKKRGYKSSRKNRGGDEGKFIDGISVAKQLYDRGITPGQFSLELLKEGKRHLPDYYRSDLQNELDRIWNFQQSFYPEILTQNFREQIRDKGQKNTSQIFLREYQIYTADNKGADKLSRALQWRVEGLSRKLSVEELAFVMSDLNGSISGSSGYLGAIGDRSKELYLGKQTVGQYLMEKLNTNPNGSLKSKVFYRQDYLDEFERIWETQAGFHKELTLELKKEIRDIIIFYQRSLKSQKGLISFCELESKLVEIEVNGKMRRKVVGSRVCPKSSPLFQEFKIWSILNNICVWSVDKSKSSEARKMDERDKEPNLNQEEKEILFKELSLKEKLSKRDVLELLFEDARKLDMNYEKVEGNRTQATLFKAYQEIIARSGHGEYDFTRMLSSEILEIVSGVFDGLGYNTDILYFNSEGELDQQPLYRLWHLLYSFEGDKSNSGNENLINKITNLYGFDREYAVILADVVFPPDYGNLSAKAIHKILPYLKDGNKYSLACEYAGFRHSKNSLTKEEREKRVLKERLDILPKNTLRNPVVEKILNQMVHVVNGVINKYGKPDEIRIELARELKKNAKEREEWTRAINKSTIENEKLRSVLKKEFGFTQVSRNDIVRYKLYLELESRGFKTLYSNTYIPLEKLFFKEFDIEHIIPQSRLFDDSFSNKTIELRSVNQEKDNQTAYDYVSGKGGEAGLQEYLERVEDLFKGGYINKAKYNKLRMTGKDIPDDFIDRDLRDTQYIARRAKAMLEEVVGNVVSTSGAVTDRLREDWQLVDVMKELNWNKYERLGLTEIVEDRDGRKIRRIKGWTKRNDHRHHAMDALTIAFTKPKYVQYLNNLNARGDKSSSVYGIERDELSRDSKGKLRFNSPMPLKEFRMEAKLHLENVLVSTKAKNKVITPNVNKSKKRGGMNQKVQLTPRGQLHQETIYGSIKQYVTKEVKVGSAFNMEMILKVANKAYREALLKRLNAFDQDAKKAFTGKNSLEKNPIFINDSHTCKVPEKVKVVSFETVYTIRKEIGPDLKVDKVIDKRVRDILETRLVEFGGDSKLAFTNLDENPIWLNKEKGIDIKRVTISGVSNVIALHDKLDKDGKLVLDEHGQPQPVDFVCSGNNHHVVVYRDPEGKIQDDVVSFFEATVRAKEGLPVIDREYKKQEGWEFLFSMKQNEYFVFPNEETGFDPKEVDLMNPDNYALISPNLFRVQTMSRVMYGNQVVRDYKFRHHLETTVKDCKELKDIAYKQYKSLDFASQIVKVRIDHIGQIVHVGEY MAD2004 4MADKEKLKETYTIGLDIGIASVGWAILGENRIIDLGVRCFDKAETAKEGESLNLSRRMARLMRRRLRRRAWRLTKLARLLKRVGLIADVGVLKQPPSKGFQTPNLWQLRVEGLDRKLGDDEWARVIYHLCKHRGFHWISKAEAKAADSDKEGGKVKQGLAGTKRLMQEKSYRTAAEMVLAEFPDAQRNKQGEYSKALSRELLCEELKELFKQQRAFGHTHADDKLETNILGNGDKKSGLFWVQKPSLSGEALLKMLGKCTFEKDEYRAAKACFTAERHVLLTRINNLRIVENGKMRGLTADERRIALWQPYQQAGDFTFKQLGSALEKHGSLLKGGYKFAGLTYPRETDEKAKNPETATLVKIPAWQELKKTLIGVGLEREWQGMADAALNGKPDLLDKIGWVLSVYKEDDEVELELGKLQLAANVIHALQTVRFDKFSNLSLLALCKILPQMENGMRYDEACKEAGYQHSMPDMQDLEAKVRYLPPFYSGREKDGRLKFNEDMDIPRNPVVLRALNQARKVVNALIRKYGSPHAVHIEMARDLSRPIDERRKIERDQADYRTKNEDARKAFASDFGFEPKGRQFEKYMLYREQQAKCAYSLAPLDLNRVLNDQGYAEVDHALPYSRSFDDGKNNRVLVLTSENRNKGNQTPYEYLDGTSDSEKWRLFESFVSGNKAYSQAKRNRLLKKDFGVKETQDFKERNLNDTRYICRFFKNYVEQYLQLHEESDAKRCVVVSGQLTSLLRFRWGINKIRSESDRHHALDAAVVAACSHGLVKRMSDYSRRKELGQVRDRIEKVDKKTGEIIDHFPSPWAHFRQELLARLHIDDANELRAVVENLGTYPPEALESLTPLFVSRAPQRRNSGAAHKETIYAQPEAMKEKGSVTQKVAVTSLKPADVDKLIDPERNVKLYAYLRKWLAGKDEREKRAKAIEASAGRGKEKRDLTPEEKIEIERLRALPQKPDKQGKPTGPIVRAVTMVIDKLSGIPVRGGIAKNDTMLRVDMFSKAKRYYLVTVYVFHSVAKELPSRAIVAHKDEDDWTVISEDFEFCFSMYPNDFIRISQKKETFMGYYAGCDRGSGNVNLWSHDRNSQIGKSGMIRGIGVKTAVNVEKFNVDVLGNIYPAPPEIRRELA MAD2005 5MGYILGIDIGIASIGFAGVNHDLKKILFSGVHIFEAAENPKTGASLAEPRRTARGQRRVIHRRAQRKNAIRQLLLRHGLNCLSVVDKKYEPTGKNTPPISPWDLRRTALDRKLTDEELVRILFHIGKHRGFQSNKKSQSNEGDDGKALKGAGDLEQKWIQSGEKTIGAYLSTQSKKRNGNESYDNFIKRDWLREEIKVIFEAQRKFNQIKATEVLRLEYAGTGEKAKRNTPEGDGIAFYQRPLQSSEKLIGDCTFEKGEKRAPKFSYTAELFVLWSRLNNTKIKIQNGDERFLTQDEKNKLVNLAHKNKGGVSYTQARKEIGLNESERFNISYRQLDKGDNSWEKIRNEAEKSNFLKLSGFHALHEALDTGSATDWQKWIGSDRDKLDEVAYITSFIEDGKIIREKYQKLGLNEDQIKKLCEIKNFSKTVDLSLKALRNILPELEKGLRYDEACKALNYNNQPENKGLSKVPKFEDVRNPVVNRALGQTRKVINACIREYGLPDTIVVELAREVGKNFRDRKDIEKEQKTNEARRNTAKTHIAEILGIIEDNVTGEDILKYRLWKEQDCFCPYSGAYITPEMLRDSTSVQIDHIIPYSRSWDNSYMNKVLVLTTENQKKKNDTPFEYLGKTNRWEALEVFARQLPPKKAERLLTENFDDKKAGEWKDRALNDTRYMARLLKTHLEQSLDLGKGNRVQTRNGSLTAHLRGAWGFPDKNRRNDRHHALDAIVIACSTQSMVQGLTNWNKYEARRKNPAERPLPPKPWESFREDAKESVNSIFVSRMPVRTISGAAHEDTIRSIRKSDGKIIQRIKLKDFKKDTLENMVDKARNIKLYDILKERLDAHGGDAKKAFATPVYMPVNDPSKPAPRINSVRILTNEKSGIEINHGLASNGDMVRVDVFKKDNKFWLVPIYVHHFAEDKLPNKAIMQGKDEREWEEMNDDDFMFSLYRNDLIKVTTKKETMLVYFGGLDRATGNISIKAHDRDPSFGTNGENRTGVKTAINFEKFSVNYFGRKHKIEKEKRLGVAHSDDSERGAAIPEQGTGAAAE MAD2006 6MKRILGLDLGTNSIGWAVINQDNINDKDILTGIECTGSRIIPMDAATLGDFDRGNAQSQTADRTKRRSARRLIERSHIRRERLNRVLMTMGWLPEHYSDSLDRYGKLSKGTEQKIAWKKSGNGNYEFIFKDSFNEMLDDFKNEHPDFAQRGLKIPYDWTIYYLRKKALTHPVTNQELAWILHSFNQKRGYYQRGEEEEQQPDKKIEYIPLKVKEIRETGETKGADKWFELILENDLVYKRTFKEMPDWKGKTLELIVTTDLDKDGNPVIKDGKAKYSIRAPKEDDWTLVKVRTQSDIRKSGKTVGCYIYDALIRKPDIKIRGKLVRTIEREFYREELEQILKKQKEFNQDLRDKELYNECIGVLYPNNDTHRKEIANRDDFAYLFINDIIFYQRPLKSKKSLISDCPYEERIYKDKSTGQKLTSAIKCIPKSHPTYQEFRLWQFLSYLKIYEKERTEIGKIQTDIDITDILLPDNESYAALFKKLNDEAEIKQDKILKYFPQLKKNIKNFRWNYPEDKTYPGNTTRAEMLKRLKKANIGSDFLTTEQETALWHILYSVNDKAELEKALSTFANKHGIEEEPFLNEFVKFPPFKSDYGAYSFKATNKLLSLMRRGCYWDEENIDCNTKERIEKIISGEYDPEINDRVREKTINLNGISDFQGLPTWLACYVVYGRHSEIKDITKWEKPSDIDNYLKLFKQHSLRNPIVEQVVLETLRTVRDIWKQVGRIDEIHIELGREMKNSASERKRIAEQISKNENTNLRIKAMLTEFLNPEFEIDNVRPYSPTQQEILKIYEEGVLNSGIEIDEKVKNFLKSFDKAENRPTRAEFLKYKLWLDQKYISPYTGQPIPLSKLFTSEYEIEHIIPQSRFFDDSLSNKVICEAKVNSEKGARLGHEFIKGCHEQIIDLGFGKTVKILSIEAYEEHVRKNYGHNKAKQKKLMLDEIPDTFIERQLNDSRYISKLVKTLLSNIVREDDEAEAISKNVITCTGQITDRLKHDWGVNDVWNGIILPRFQRMEKLQPGKRFTATNTNGKLIPYMPLEYQKGFSSKRIDHRHHAMDAIVIACANRNIVNYLNNESARSDAKISRYDLRNLLCDKKKQDDAGNHTWTMKIPWNTFIRDMRKALEGIIVSFKQNLRVINKSSNHITKYVDGQKKRVPQSEGDNRSIRKSLHKDTVFGLVNLREKKTVSLSEALKKPDRIVDKALKHRIKEFKTAGKTDTDIKKLLKNGPDKVEIYYFSEEKSIGKDKARHYYAARTTILSLEMDKSKSYEKAINTINNITDSGIRKILTNHLEANGNDPSKAFSADGIDEMNKNIILLNGGKNHKPIYSVRKYEEANKFAVGEIGCKSKKFVEADKGGNLYFAVYKKDDNSRSFRTIPLNEVIDRLKNKMSPVPETDEMGNRLIFWLSPNDLVYLPTADEVENGRVTLPLDKDRIYKMVSANKKQCFFMPSNTANPIISIEFSSSNKMERAITGEMIKETCIPLKTDRLGNITDFDGRIS MAD2007 7MENYRQKHRFVLATDLGIGSNGWAIIDLDAHRVEDLGVQIFESGEEGAKKASARASQQRRLKRSAHRLNRRKKQRKEALIKFLQEIEFPDLVEILNSFKKQKNPNDILSLRVKGLDNKLSPLELFSILIYMSNNRGYKDFYDNDINDNNTDKDEKEMEKAKSTIEKLFASNSYRTVGEMIATDPTFIVDKSGSKKVIKYHNKKGYQYLIPRKLLENEMSLILHKQEEFYDCLSIDNITIILDKIFFQRNFEDGPGPKNKRDDYKNNSKGNQFYTGFNEMIGLCPFYPNEKKGTKNSLIYDEYYLINTLSQFFFTDSNGVIMSFSKSLLHDLMLYFFDHKGELTNKELSSFLLKHGLELNSKEKSNKKYRLNYMKQLTDSTIFETEMIASFREEIETSSYRSVNSLSNKIGNCIGQFITPLKRKEELTNILIDTNYPKELASKLADSIKVIKSQSVANISNKYMLEAIHAFESGKKYGDFQAEFNETRELEDHHFMKNNKLIAFQDSDLIRNPVVYRTINQSRKIINAAINKYNIVRINIEVASDVNKSFEQRDNDKKYQNDNYEKNLQLESELTDYINKENLHVNVNSKMMERYKLYLSQNKHCIYTNTPLTMMDVIYSTNVQVDHIIPQSKILDDTLNNKVLVLRDANSIKNNRLPLEAFDEMQINVDTNYTKKDYLTECLHLLKNKTNPISKKKYQYLTLKKLDDETIEGFISRNINDTRYITRYIANYLKTAFKESDKTKNIDVVTIKGAVTSRFRKRWLTTYDEYGYHPTIYSLEDKGRNLYYYHHAIDAIILANIDKRYITLANAYDTIRLIKIDRNLSKEQKQRDIDTVIKNTVKSMSKYHGFSEDYIRSLMSKNHIPAICKNLSDEVQIRIPLKFNTDYDNLGYRFTDDQYHYKKLYIAFKEAQNALKEKETLEKELIERFNNEAQILNANIILTYTGFESNNELIDIKKAKKVTDTLKPNLKNYIKAIDILTQEEYTKRCLEYYNDSEFATQLKIPYVNFKINKRFRGKIQGSENAVSLREVLKKTKLNSFEEFES YLKSEDGIKSPYYIKYTKNTLGKESYTIYEANSYYCAEIYTDSQNKPQLRGIRYVDVRKEDGKLVLLKPLPSTCKHITYLFHNEYIAIYKDSNYKRLKNNGFGAYRSINNVNVNKIIIRLFANQNLNDNDVVITSSIFIKKYSLDVFGHINGEIKCGDQSLFTIKKR MAD2008 8MKKILGLDLGTNSIGWALIEHNFDKKEGRIDDLGVRIIPMSADILGKFDAGQSHSQTAERTGYRGVRRLYQRDNLRRERLHRVLNILDFLPEHYAEHIDFEKRLGQFKEGKEIKLNYKSNKDSKFEFIFKASYNEMLAAFKKYQPGLFYVKANGTETKIPYDWTIYYLRKKALSQPLTKQELAWIILNFNQKRGYYQLRGEEIDDDKNKQFVQLKVKEVIDSGEAVKGKKLFNVIFENGWKYDKQVVKTEDWIGRTKEFIVTTKTLKSGEIKRTYKAVDSEKDWAAIKAKTEQDIERSNKTVGEFIYEALLQDPTQKIRGKLVKTIERKFYKAELREILRKQIELQPQLFTTKLYNACIKELYPNNEAHRNSIKNRDFLYLFLDDIIFYQRPLKSQKSNISGCHLEQRIYTKINPVSGKKEEVKQAVKAIPKSHPIFQEFRIWQWLQNLKIYDKINTDKGELADVTNQLLPSEESLLDLFDYLQTKKELDQSGFIKYFIDKKLINKSEKENYRWNYVEDKKYPFAETRAQFISRLNKVKNINNISEFLNKKTRLGEKESSPFVTRIEQLWHIIYSVSDINEYKSALEKFALKHDIDKESFVANFIKFPPFKSDYGSYSKKALSKLLPLMRRGKYWNESDISNKVKQRVSDIMERVNALNLKENYNAKELAEALKTVSDDDVKKQLIKSFVPFKDKNPLKGLNTYQATYLVYGRHSEVGDIQSWKTPEDIDTYLKNFKQHSLRNPIVEQVVTETLRVVRDIWIHYGKSQLNFFNEIHVELGREMKNPADKRKQISNRNIENENTNNRIREILKDLKNDTSIEGDIRDYSPSQQDLLKIYEEGVYQNPKVDYSKVSEDEITKIRRSNSPTPKEIQRYRLWLEQGYISPYTGKPIPLSKLFTHEYQIEHIIPQSRYFDNSLSNKIICESAVNEDKDNKTAYEYLKNKSGNVINGHKLLRIEEYEAHVNRYFKNNRQKLKNLLSEDIPEGFINRQLNDSRYISKLIKGLLSNIVRQENEQEATSKNLIPVTGAVTSKLKNDWGLNDKWNELILPRFERLNQLTQTKNFTTSNTNGNTIPTVPDDLLKGFSKKRIDHRHHALDALVVACCTRNHVQYLNALNAEKANYGLRKKLLIVNEQGDFTKIFQMPWKGFTSEAKNQLEKTVISFKQNLRVINKANNKFWSFKDENGNINLDKNGRPVKKLRKQTKGDNWAIRKAMHKETVSGKSNIETPKGKIATAVRGSLADIKNEKHLGKITDVQIREVILPNHLKNYVDEKGKVKFDLAFNDEGIEDLNKNIIALNNGKKHQPIRKVKFFEVGSKFSISENENSAKSKKYVEAAKGTNLFFAVYWDEKKQKRNYETVPLNEVIAHQKQVAHLTNNERLPIQTNRKKGDFLFTLSPNDLVYVPTDAEVANKQPIDFKNLHQNQVNRIYKMVSSSGNQCFFIKDKIATSIWNKNEFSSLNKMEKDIDGNMIKERCIKLNVDRLGNITKA MAD2009 9MKKILGLDLGTNSIGWAVVNADAITRNDGSRYLKPNSISAAGSRIIPMSADVLGNFESGITVSQTKDRTDKRMARRLHERALLRRERLLRILSLMDFLPKHFASKINRYGKFTDDSEPKLAWRKNTEGKYEFIFQDAFNEMLAEFKDKQPEIVKEGKKIPYDWTIYYLRKKALEKALSKEELSWLLLQFNQKRGYYQLRGEEEDIPQDKKIEYLAQKVVKVEATDQKKGDDIWYNVYLENGMIYRRTSKAPLDWEGKIKEFIVTTDLEKDGTPKKDKEGNIKRSFRAPQEDDWTLLKKKTEADIEKSTKTVGCYIYDSLLNNPKQKIIGKLVRTVERKFYKEELTQILKKQVELIPELRNDNLYKQCIEELYPINEAHRNTIAKTDFANLFINDILFYQRPLKSKKSQIDNCPYEEHIFIDSKTGEKKKVPVKCITKSNPLFQEFRLWQFIQNLRIYQREKEIDGKLSTDVDITSECLKSEEDYVRLFDWLNDRESIEQEELLKYLFNTKKSKNKENPYRWNYVEDKVYPCNETRATILKGLSKCGINASVLSSEMEMALWHILYSVEDKKEIETALTHFAQKQGWNGEFAIVFSKLKPFKKDYGSYSEKAIKKLLSLMRMGKYWNQDNIDKNTLDRIDKIINGEYDEKISNRVRDNAINLKDISDFRGLPVWLACYIVYDRHSEAKDCTKWNTPEEIDSYLKKFKQHSLRNPIVEQVVTETLRTVRDIWKQEGQIDEIHLELGRDLKNPADKRKKMSENILKNENTNLRIKAMLMEFMNPGMGIENVRPYSPSQQDILRIYEENALENLTKDDEEFDFISKISKQAQPTKSDIVRYKCWLEQKYRSPYTGKTISLSKLFTSAYEIEHIIPQSRYFDDSFSNKVICEAEVNKLKDRQLGHEFIEEHHGEKVQLSQGEVVEILSVDAYEKFVKENYANNRVKMKKLLMENIPDEFIERQLNDSRYISKVVKGLLSNIVREKIDDENYEPEAVSKNLISCNGAVTDRLKKDWGMNDVWNSIILPRFIRMNQITGKDCFTTTNAEGHLIPQMPLELQKGFNKKRIDHRHHAMDAIVIACTTRDHVNLLNNEAAHSKFNATRYQLQRKLRCFEKAMIDGKEREVAKEFLKPWDSFTMDSKNILENIIVSFKQNQRVINKTTNTFQHFDENGKKTFVKQGKGNSWAIRKPMHKDTVFGEINLRKVKSVSLSDAIKVPERILNKRIKEKITELKNNKVDAKNIKKYIEEYHIGGYGIDTSKIDVFYFTKETKERFFATRKSLDSSFNQAKIEDSIADSGIQKILLAHLKSKNGDAEQAFSPDGIDEMNKNIVELNNGKFHQPILKVRVYEKADKFAVGQKGNKKVKFVEAAKGTNLFFAVFEKDGKRSYLTIPLNVMIDCQKKYGNQWKQNIESYLKEKDLVEKDVQLLFILSPNDLVYLPTENELKKGITNPDKDQIYKFVSCTSNEAHFIPSFVANPIVQTTELGSNNKAQRAWNNKMIKEICI PIEVDRLGNIK MAD2010 10MVEKILGIDLGISSLGWAVVEYDKDNDENNKIIDCGVRLFTAAETPKEKESPNKARRDARGLRRVIKRRRVRMNTIKNLLITYKLIDKTLLDEEMGMFHSQSNRVDVWKLRHDALYHLLSGDELARVLIHIAKHRGYKFLGDDESDEESGKVKKAGAELKKKFLEAGCQSVGEWLWKERGLQGKKRNKSGDYEISIPRDFLVEEIQRIFETQQKFGSTFATSELQKAYTDIAFYVKPMQSIEDMVGYCTFYPKIKSKNQDGEKRAPKASPSAEQFVILSKIFSTIVIDENKQEKKLIELKSIEQLIQIARSKETLKYKQLRKELNLAKDISFKSISDEEKTWINLVGNAKFKKILGLNYETFLKNTEISDEIAKILTYDKTFEQKETKLKNLLVNIDWIDNNHIAELAKLSFSQFNQLSLKAIKIISKIMIEGYARYDEAVQYAFENNLLPKPSHEKSILLPPLKETNIAILNPTVIRAFAQFRQVANALVSKYGSFDKVHFELAREVNTKEDRKRWEKDRDKNEKMHRQITEKLVEEGVKPSYKNILKSKLRSEQKDTCPYCQKNLHYPMIFEDGYAEIDHILPLSQSQDDSYVNKVLVHSACNQNKKNRTPFEWFQDEKKDWDTFKSYILMESTLGEKKRNYLIKENFSDPQSRKEFISRNLNDTRYMSKAIKTYCENHWKLSHDDDKLRIQVRSGKLTSTLRHQWGLDNKNRETHTHHAMDAIMIAFSTQGMVKKLSDYFAKKEAKVEKDKPVLITPIKQFKEAVEQATTLERQESIQTKAGDTITLNRLLISRPPRASVTGAAHEQTAKPYPRIKPIKNKYKRRRIPIDEDKFELFRNDKVASGNDKNFYNSSTIPRVDIYKKDDKYHVVPIYLSDMTKAEVPNKSLGTNPEGMDEKYFCFSVFKNDLIELETKATPKKPSKKLLGYFKQLNGANFILNSIHNGIIDGFVCSPITLFKQQKDMCKKCLPEDRAIGNCSQETLEFWEAENIKVPKKDFECDQGIKFAIAVRKYTIDPLGYYHEVKGEKLLGTIPQGAKKHPKRQK MAD2011 11MKKDYVIGLDIGTNSVGWAVMTEDYQLVKKKMPIYGNTEKKKIKKNFWGVRLFEEGHTAEDRRLKRTARRRISRRRNRLRYLQAFFEEAMTALDENFFARLQESFLVPEDKKWHRHPIFAKLEDEVAYHETYPTIYHLRKKLADSSEQADLRLIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFMIIYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKSEKVLQQFPQEKANGLFGQFLKLMVGNKADFKKVFGLEEEAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYDAVELSTILADSDKKSHAKLSSSMIVRFTEHQEDLKNFKRFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQLKFYQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVIPHQIHLAELQAIIHRQAAYYPFLKENQKKIEQLVTFRIPYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLDQSATAFIERMTNFDTYLPSEKVLPKHSLLYEKFMVFNELTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKVKKKDIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKCGLTRAELDHPDNAEKLEDIIKILTIFEDRQRIRTQLSTFKGQFSEEVLKKLERKHYTGWGRLSKKLINGIYDKESGKTILDYLIKDDGVSKHYNRNFMQLINDSQLSFKNAIQKAQSSEHEETLSETVNELAGSPAIKKGIYQSLKIVDELVAIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKAMAEIGSNLLKEQPTTNEQLRDTRLFLYYMQNGKDMYTGDELSLHRLSHYDIDHIIPQSFMKDDSLDNLVLVGSTENRGKSDDVPSKEVVKKMKAYWEKLYAAGLISQRKFQRLTKGEQGGLTLEDKAHFIQRQLVETRQITKNVAGILDQRYNAKSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYHHGQDAYLNCVVATTLLKVYPNLAPEFVYGEYPKFQAFKENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYLKTIKKELNYHQMNIVKKVEVQKGGFSKESIKPKGPSNKLIPVKNGLDPQKYGGFDSPVVAYTVLFTHEKGKKPLIKQEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYEFPEGRRRLLASAKEAQKGNQMVLPEHLLTLLYHAKQCLLPNQSESLAYVEQHQPEFQEILERVVDFAEVHTLAKSKVQQIVKLFEANQTADVKEIAASFIQLMQFNAMGAPSTFKFFQKDIERARYTSIKEIFDATIIYQSTTGLYETRRKVVD MAD2012 12MALNPPLPYTLGLDIGMASVGAALLTEQRILGLHVRAFDKAETAKEGDPLNKTRRESRLTRRRIRRRAHRLLRLARLFKRTGLIAAAHPEAFALPGISPWDLRADGLNRLLTPAEWAAVLYHLVKHRGFQSTRKSEAKEDEKAGQMLSGVSANQQRMKEKGWRTVGEMAARDEAFAEAKRNKGGAYTHTFARTDLVAELKLLFKQQAGFGNPHVGVDFENDVEQLLLARRPALAGDALLKLVGKCPFEPTEYRAPKASYSAERFVWLTKLNNLRISEVGEQRALTAGERQILLNQPYLLAKFTYKQARQRLSLADTAKFTVLTYRGDKDPESTTFFEAKAYHELRKAYEKAGLESRWQRDALDPTRLDRLAWALTCYKTDDDIRAHLAEHGVEPEITEAVLGESFDKFVGLSLKALGKILPFMEQGQRYDEAVQSAGYAHHSQLNRDTTKNQYLPPPDKDQIRNPVVYRALNQARKLVNAIVREYGSPAAIHIELARDLSKPMDERRKIEREQKEFQERKAKDREAFIEQFSFDPKGLDLQKYRLYREQMSQCAYSQKAIDVTRLFEPGYAEIDHALPYSRSYDDGQNNKVLVLTAENRNKGNRTPYEYLDGASDSPQWQRFEAWVLQNKAYRRAKRDRLLRKHFGEDEAEGFRERNLIDTRYICRAFKTMVEDHLQWHADSDAKNRCVVVAGQLTSLLRARWGLIKVRENGDLHHALDAAVIAAANRSLVKRMADYSKRNELAQVRDRYIDPATGEILDIAAMRQVEEHFPSPWPHFRSELLAWLSPNPAHGLDGLAHYPPEELEHLRPMRVSRAPTRRGLGAAHQETIRSVGREGRLLADGQSAVKTPLTAIKLKDLENIVGYSHSHNHAMIEAIRKRLETNGNDGAKAFKMPLFKPSATNGYDADKSHVGETDQRAPQIRSVKLLATQKSGIPIRKGIANNGSMLRVDVFGKGGKFYAVPVYVADAARAELPYRAVAAFKPENEWPEMDEKQFMFSLHPNDWVTVKLKAETISGYFAGMDRSTGAISVWAHDRNQSIGKDGQWRGVGMKTALAVEKYHVDLLGNLHRVHTEMRLPLHGSKASKD MAD2013 13MSKILGLDLGTNSIGWALIDDNQNRILGVGSRIFPMGVENLGDGDGEVSKNASRTGARGVRRQFFRRRLRKKVLLKALSEHNMCPMVTIDFEDWKKSKQFPSEKLSNWFSLNPYELRHKALSEKLTLEEIGRILYHLIQRRGFLSNSRKGGSDDGAIFKGNPKEGKIGITETQESIQDKSLGSYLFEIYPKENQPPEGGLERIRNRYTIRKMYVDEFELIWNKQSQFHSSLNDDLKTLLGGRKLDGYKEDGILFHQRPLRSQKHLVGNCSFEPTKTKCPISAIPFEMFRIWQWVNTLEYNGKKITQEEKEKIVEFMCANEKPDFKRIRKVIGKESAEFKFNYKDDDKIVGTHTISNLSNKKFFGKAWFDFSEKEQEDIWHVLYFFDSKSNLKDYAIKHWNFNEAQASDVSKFNVKDGYSSLSRKAISNILPFLKLGFTYDVSVVLGGIKNVFGSEWEKLSEEKRNYLIDNVEGIVRSKIKGGFIDVIKGILRNDYSISDNQLRKLYHHSATIDAVELLDKLPVGKEADKEIQAIRNPIVITALFELRKLVNELIDEHGKLDEIKVEMARDLKISKSQRNKIRREQKRLERENDRVKDRLVENNIRITHDNILLYKLWEECKKTCPYTGKPISVTQLFSGEVQIEHIHPWSRSLNDSFSNKTLCYADENRKKGNLTPFEFYGSDETNWSAIKERALKLFSDTKEYPNAYQKFKRFVQVKFDDDFSSRQLNDTRYISKEAKNYLSRICKNVIVSPGQATSNLRQKWGMNNILSDENEKTRDDHRHHAVDALVMACTKVSYVQELAKWNRYNRNSELKNFPLPWETFRFDAEKAVEKILISHKKVSNDITVRTHITEKNGIKYKNVGVAARGQLHKETVFGKRTFNGEEAYHVRKSIDSLETAKQIEKVVDETIKQLILKRVNELGGFVKDKVPANTFFIVDEKGIKQPQLFLPNKNGQPIPVLKVRVKESVGRAEQLKANVNQWVNPRNNHHVLIYKDEHGNLKEDVVTFWTVVERKRTGQSIYQLPINGKEIITSLHTNDMFIIGLNEDEINWELIDFNLINHHLYRVQKTSKKEKSFEFNFRLGIASSLDNKSQEISIQSFKKWIELNPIKVKISVSG KIQKV

Example 3 Vector Cloning, MAD-Series RGN Library Construction and PCR

The mined MAD-series RGN coding sequences were cloned into a pUC57vector with T7-promoter sequence attached to the 5′-end of the codingsequence and a T7-terminator sequence attached to the 3′-end of thecoding sequence. 100 ng of the plasmid mixture was transformed into E.cloni® SUPREME electrocompetent solo cells (Lucigen, Middleton, Wis.)).After the cells were recovered in 5 mL of recovery medium at 37° C. for1 hr in a shaking incubator, 1 mL of 50% glycerol was added and thecells were stored at −80° C. as 100 μL aliquots.

The stored cells were diluted in phosphate buffered saline and spread onLB agar plates with 100 μg/mL of carbenicillin. The cells were thengrown overnight at 37° C. in an incubator. Colonies were picked andinoculated into 1 mL of LB medium (100 μg/mL of carbenicillin) in96-well culture blocks. Cultures were grown overnight in a shakingincubator at 37° C. Next, 1 μL of the cells were diluted into 500 μL ofPCR grade water, and 25 μl aliquots of diluted cultures were boiled for5 min at 95° C. using a thermal cycler. The cells were used to PCRamplify the different mined MAD-series RGN coding sequences. The rest ofthe cultures were stored at −80° C. with added glycerol at 10% v/vconcentration.

First, Q5 Hot Start 2× master mix reagent (NEB, Ipswich, Mass.) was usedto amplify the mined MAD-series RGN sequences using the boiled cells asa source of mined MAD-series RGN templates. The forward primer5′-TTGGGTAACGCCAGGGTTTT [SEQ ID No. 49] and reverse primer5′-TGTGTGGAATTGTGAGCGGA [SEQ ID No. 50] amplified the sequences flankingthe mined MAD-series RGN in the pUC57 vector including the T7-promoterand T7-terminator components at the 5′- and 3′-end of the minedMAD-series RGNs, respectively. 1 μM primers were used in a 10 μL PCRreaction using 3.3 μL boiled cell samples as templates in 96 well PCRplates. The PCR conditions shown in Table 3 were used:

TABLE 3 STEP TEMPERATURE TIME DENATURATION 98° C. 30 SEC 30 CYCLES 98°C. 10 SEC 66° C. 30 SEC 72° C. 2.5 MIN FINAL EXTENSION 72° C. 2 MIN HOLD12° C.

Example 4 gRNA Library Construction

The functional gRNAs associated with each RGN can be difficult topredict since multiple RNAs may be needed for the RGN to function andthe length of transcribed RNA can also be highly variable. Therefore, a384-member library of gRNAs was created for each RGN. The gRNA consistedof a variable spacer sequence, a CRISPR repeat sequence, a linkersequence, and the tracrRNA sequence. FIG. 3 is a schematic of gRNAdesigns for the mined MAD-series RGNs. The tracrRNA was found byidentifying the anti-repeat sequence. The crRNA was covalently linked tothe tracrRNA using a GAAA linker. The initial gRNA design was optimizedby creating a library of gRNAs by truncating the 5′ region, the 3′region and the repeat/anti-repeat duplex. To find the optimal gRNAlength, different lengths of spacer, repeat:anti-repeat duplex and 3′end of the tracrRNA were included. The library also was subdivided intosix pools based on the overall length of the gRNA. This enabledidentification of the shortest gRNA that is optimal for nucleaseactivity. These gRNAs were then cloned downstream of the T7 promoter.

The target library was designed based on an assumption that the PAMs ofthese nucleases will reside on the 3′ end. Two artificial protospacerswere selected with different GC content. Since PAM sequences can rangefrom 3-7 or 3-10 nucleotides in length, three different PAM targetlibraries were prepared for each protospacer. Library 1 contained thePAM NNNNATGC; library 2 consisted of the PAM ATNNNNGC; and library 3contained the PAM ATGCNNNN. The sliding PAM library ensures that PAMsranging from 4 nt to 8 nt are captured. The target libraries were clonedinto a target plasmid that contains thesequences necessary for nextgeneration sequencing of uncut targets.

Example 5 In Vitro Transcription and Translation for Production ofMAD-Series Nucleases and gRNAs

The MAD-series RGNs were tested for activity by in vitro transcriptionand translation (txt1). Both the gRNA plasmid and nuclease plasmid wereincluded in each txt1 reaction. A PURExpress® In Vitro Protein SynthesisKit (NEB, Ipswich, Mass.) was used to produce mined MAD-series RGNs fromthe PCR-amplified MAD-series RGN library and also to produce the gRNAlibraries. In each well in a 96-well plate, the reagents listed in Table4 were mixed to start the production of mined MAD-series RGNs and gRNAs:

TABLE 4 REAGENTS VOLUME (μl) 1 SolA (NEB kit) 3.3 2 SolB (NEB kit) 2.5 3PCR amplified gRNA subpool 1 4 Murine RNase inhibitor (NEB) 0.2 5 Water0.3 6 PCR amplified T7 MAD-series RGNs 1.0

A master mix with all reagents was mixed on ice with the exception ofthe PCR-amplified T7-MAD-series RGNs to cover enough 96-well plates forthe assay. After 7.3 μL of the master mix was distributed in each wellin 96 well plates, 1 μL of the PCR amplified MAD-series RGNs under thecontrol of T7 promoter was added. The 96-well plates were sealed andincubated for 4 hrs at 37° C. in a thermal cycler. The plates were keptat room temperature until the target pool was added to perform thetarget depletion reaction.

After 4 hours incubation to allow production of the mined MAD-seriesRGNs and gRNAs, 4 μL of the target library pool (10 ng/μL) was added tothe in vitro transcription/translation reaction mixture and allowed todeplete for 30 min, 3 hrs or overnight at 37° C. The target depletionreaction mixtures were diluted into PCR-grade water that contains RNAseA and then boiled for 5 min at 95° C. The mixtures were then amplifiedand sequenced. The PCR conditions are shown in Table 5:

TABLE 5 STEP TEMPERATURE TIME DENATURATION 98° C. 30 SEC  6 CYCLES 98°C. 10 SEC 61° C. 30 SEC 72° C. 10 SEC 22 CYCLES 98° C. 10 SEC 72° C. 10SEC FINAL EXTENSION 72° C. 2 MINUTES HOLD 12° C.

Example 5 Results

The screen was performed on three target pools containing NNNNATGC,ATNNNGC and ATGCNNNN PAMs. FIG. 4 shows the depletion map of MAD2001 ontargets containing NNNNATGC 3′ PAMs as measured in vitro; FIG. 5 showsthe depletion map of MAD2007 on targets containing NNNNATGC 3′ PAMs(top) and ATNNNNGC 3′ PAMs (bottom) as measured in vitro; FIG. 6 showsthe depletion map of MAD2008 on targets containing the NNNNATGC 3′ PAMsas measured in vitro; FIG. 7 shows the depletion map of MAD2008 ontargets containing the NNNNATGC 3′ PAMs as measured in vitro; FIG. 8shows the depletion map of MAD2011 on targets containing the NNNNATGC 3′PAMs as measured in vitro; and FIG. 9 shows the enrichment of targetscleaved by MADs shown in the form of a sequence logo, summarizing thePAMs of the mined MAD-series RGNs that were active in the in vitroscreen. Thus, as seen in FIGS. 4-9, the MAD2001, MAD2007, MAD2008,MAD2009 and MAD2011 nucleases were active. MAD2001 cut targetscontaining the NNRC PAMs, MAD2007 preferred NNNSR PAMs, MAD2008 andMAD2009 depleted targets with NNAA PAMs. PAMs NNNNATGC, ATNNNGC andATGCNNNN are substantially less restrictive than either a NGG or TTTVPAM (see FIG. 11) for human genome editing. Lastly, MAD2011 has an NGGPAM similar to SpCas9. The gRNA subpools that were most active wereidentified. Since each gRNA subpool contained 64 different combinationsof gRNAs, we tested each gRNA within the most active or shortest subpoolto identify the optimal gRNA sequence. The three gRNAs that showed thehighest depletion in the in vitro assay were identified. These sequencesare listed in table 6.

TABLE 6 gRNA Spacer Repeat Linker tracrRNA MAD2001- 23 GTTTGAGAGTGTTGAAA GATAGACAAATGTGTCTTTGACAACAC gRNAvl GTCAAATAAGAGTAAGTTCAAATAAGGCATTGCCGTAATC CGGACCAATC GTTCTTATGAACCCCGCAGTTGGCGGGA[SEQ ID No. 20] AACCTTCTGTTGTCA [SEQ ID No. 21] MAD2001- 23GTTTGAGAGTGTT GAAA GACAAATGTGTCTTTGACAACACAAGTT gRNAv2 GTCAAATAAGAGTCAAATAAGGCATTGCCGTAATCGTTCTT CGGACC ATGAACCCCGCAGTTGGCGGGAAACCT[SEQ ID No. 22] TCTGTTGTCA [SEQ ID No. 23] MAD2001- 23 GTTTGAGAGTGTTGAAA AACACAAGTTCAAATAAGGCATTGCCG gRNAv3 [SEQ ID No. 24]TAATCGTTCTTATGAACCCCGCAGTTGG CGGGAAACCTTC [SEQ ID No. 25] MAD2007- 23GTTTTAGTCGTCTG GAAA ATAACTTTACCAGTGAATATCAGACGGC gRNAvl TTATTTATTGGTAATAAGATAAAGCTATAAGCTGTGGGGTC GGTTAT GCGCATCCCCAATTTCGCGCACGAGCGT[SEQ ID No. 26] TAGCTCGTT [SEQ ID No. 27] MAD2007- 23 GTTTTAGTCGTCTGGAAA ACCAGTGAATATCAGACGGCTAAGATA gRNAv2 TTATTTATTGGTAAGCTATAAGCTGTGGGGTCGCGCATC [SEQ ID No. 28] CCCAATTTCGCGCACGAGCGTTAGCTCGTT [SEQ ID No. 29] MAD2007- 23 GTTTTAGTCG GAAACGGCTAAGATAAAGCTATAAGCTGTGG gRNAv3 [SEQ ID No. 30]GGTCGCGCATCCCCAATTTCGCGCACGA GCGTTAGC [SEQ ID No. 31] MAD2008- 23CCTGTGAATAGTC GAAA TATAAAAAATAATTATAGAACCAAACT gRNAvl AAACCAATTATGAAAAAAATATTAGGAC [SEQ ID No. 32] TTGACTTAGGAACCAACTCTATTG[SEQ ID No. 33] MAD2008- 23 CCTGTGAATAGTC GAAATATATAAAAAATAATTATAGAACCAAA gRNAv2 AAC CTAACCAATTATGAAAAAAATATTAGG[SEQ ID No. 34] ACTTGACTTAGGAACCAACTCTATTG [SEQ ID No. 35] MAD2008- 23CCTGTGAATAGT GAAA TAAAAAATAATTATAGAACCAAACTAA gRNAv3 [SEQ ID No. 36]CCAATTATGAAAAAAATATTAGGACTT GA [SEQ ID No. 37] MAD2009- 23 GTTGTGAATTGCTGAAA AAAGCAATTCACAATAAGGATTATTCC gRNAvl TT GTTGTGAAAACATTTAAGGCGGTGCGA[SEQ ID No. 38] AAGCATCGTCCT [SEQ ID No. 39] MAD2009- 23 GTTGTGAATTGCTGAAA AAAGCAATTCACAATAAGGATTATTCC gRNAv2 TT GTTGTGAAAACATTTAAGGCGGTGCGA[SEQ ID No. 38] AAA [SEQ ID No. 40] MAD2009- 23 GTTGTGAATTGC GAAAGCAATTCACAATAAGGATTATTCCGTTG gRNAv3 [SEQ ID No. 41]TGAAAACATTTAAGGCGGTGCGAAAGC ATCGTC [SEQ ID No. 42] MAD2011- 23GTTTTAGAGCTA GAAA TAGCAAGTTAAAATAAGGCTAGTCCGTT gRNAvl [SEQ ID No. 43]ATCAACTTGAAAAAGTGGCACAGAGTC GGTGCT [SEQ ID No. 44] MAD2011- 23GTTTTAGAGC GAAA GCAAGTTAAAATAAGGCTAGTCCGTTAT gRNAv2 [SEQ ID No. 45]CAACTTGAAAAAGTGGCACAGAGTCGG TGCT [SEQ ID No. 46] MAD2011- 23GTTTTAGAGTCAT GAAA CCATTTTAAACGAAAAACTCCTCTAAAA gRNAv3 GTTGTTTAGAATGCGATTGCAGCTTATCGTAAAAATGAAG G [SEQ ID No. 47]GAACCTATGATTAAAGAAAGCCGACTG CA [SEQ ID No. 48]

Using the PAM information and top three gRNA designs, the activity ofMAD2001, MAD2007, MAD2008, MAD2009 and MAD2011 was tested in HEK293Tcells. The cells were co-transfected with the nuclease plasmid and gRNAplasmid. The nucleases were expressed from a strong CAG promoter whereasgRNAs were expressed from a U6 promoter. The cells were analyzed forindels by a T7E1 assay. As shown in FIG. 10, four out of the five MADsshowed activity. MAD2001 was active on multiple endogenous targets,showing up to 20% indels on a target containing GAGC PAM. This is thefirst demonstration of an archaeal CRISPR nuclease that is active inmammalian cells. MAD2007, MAD2008 and MAD2011 show lower and variablelevel of indels on the targets tested. FIG. 11 illustrates the humangenome coverage of the newly-discovered nucleases for precise editing,providing coverage for precise editing in human cells.

In addition, nickase and nuclease dead variants of MAD-series nucleases,namely MAD2001, MAD2007, MAD2008, MAD2009 and MAD2011, were alsoidentified where these nickases and dead variants are used for variousnickase based precise editing applications. The sequences of the nickaseand nuclease dead variants are listed in Table 7 below, where the aminoacid residues that vary from the wildtype MAD2001 [SEQ ID NO. 1]nuclease amino acid sequence (for SEQ ID Nos. 14-16) and MAD2007 [SEQ IDNo. 7] nuclease amino acid sequence (for SEQ ID Nos. 17-19) areunderlined and bolded.

TABLE 7 Mined SEQ MAD-series ID Name No. Sequence MAD2001 14MKNTNEDYYLGL A IGTDSVGWAVTDKEYNILEFRRKPMWGIHLFEGG Nickase 1STAQKTRVYRTSRRRLKRRAERIALLRDIFSEEIGKVDPGFFERLDESDLHLEDRVTSQKNSLFDDPEFNDKDLHKRFPTIYHLRRHLMHSNRKEDIRLIYLAAHHIIKFRGHFLYKGIGDEEIPSFEIVLNSLIDNLRDEYGMELEVSDRDLVKALLSDFSIGIREKSRELSSCLNAESENEKALVDFISGKKTNMKKLFDDEALDKMSFSLRDSGFEDQLRENEGVLGPERVHTLELSRQIFEWARLSSILKDSDSISEAKIKDYDQHREDLRMLKRAVKKYAPDKYSEVFKSKEHTGNYCSYVYVCGKGLPDKKCSTEEFQKYLKKILDDSGVRDDEEFKTLIQRLDAGILCPKQRTGENSVIPYSVHRKELIGILNNAAEHYPSLSRKGEDGFSSIDKILMLEEFRIPYYVGPLDDRSSRSWLIRNSFEAITPWNFNEIVDEDETSERFIGNLTSMCTYLGGEKVLPKNSLLYSRFMLYNEINNLRVGGEKIPAALKNKMVSELFANRATSSKVTLKELKAFLKGEGVLTDADEISGIDDGVKSTLRSEILIRKIIGDKISDREMAEEIVRILTVFGDERRRSKAKLKKEFSDKLTEKEIEKLSSLKFDGWGRLSEKFLTGLRQEVNGRSMSIIEILEDTNYNLQETLSKYSFNEIIDSYNEVLTSGPRSISYDILKDSYLSPAVKRGVWRALSVVKDILKAVGRPPKKIFVETTREEREKKRTESRKDALMYLYKSCKETEWEKRLDSVEESSLRNRSLYLYYTQLGKCMYCGKNIDIGELNTDLADRDHIYPQSKTKDDSIRNNLVLVCRGCNQAKGDRYPLPQEWVSRMHAFWTMLKDKGYISSEKYRRLTRRGELTEEEFGAFINRQLVETSQSAKAVITVLKNAFKDSDIVYVKGSNVSDFRSSYNFIKCRSVNDYHHAKDAYLNIVVGNVLDTKFTKNPSYVLKNREQYNIGRMYDRNVSRFGVDAWVAGDRGSIATVRKYMRRNNILFTRYATKSKGALFKETVHRKKEGLFERKKGLETEKYGGYSDISTSYLTLLEYDKGKKRIRSLEIVPTYFANTRPKEEDVIRFFSETRGLANVRVVMPEVRMKSLFEYRGFRFHVTGSNGKGRFWISSAIQLLLPENLYAYCKSIENNEKDSQRRSEKPLQNYGFSSEMNIELFKCLMDKAAKPPYDVKLSTLSKNLEEGFEKFKALELGPQVKVLQQILDIYSCDRKSGDLSVLGSARNAGRLDMNGVLSEADGEQVTMICQSP SGLFEKRVPMNEK MAD200115 MKNTNEDYYLGLDIGTDSVGWAVTDKEYNILEFRRKPMWGIHLFEGG Nickase 2STAQKTRVYRTSRRRLKRRAERIALLRDIFSEEIGKVDPGFFERLDESDLHLEDRVTSQKNSLFDDPEFNDKDLHKRFPTIYHLRRHLMHSNRKEDIRLIYLAAHHIIKFRGHFLYKGIGDEEIPSFEIVLNSLIDNLRDEYGMELEVSDRDLVKALLSDFSIGIREKSRELSSCLNAESENEKALVDFISGKKTNMKKLFDDEALDKMSFSLRDSGFEDQLRENEGVLGPERVHTLELSRQIFEWARLSSILKDSDSISEAKIKDYDQHREDLRMLKRAVKKYAPDKYSEVFKSKEHTGNYCSYVYVCGKGLPDKKCSTEEFQKYLKKILDDSGVRDDEEFKTLIQRLDAGILCPKQRTGENSVIPYSVHRKELIGILNNAAEHYPSLSRKGEDGFSSIDKILMLEEFRIPYYVGPLDDRSSRSWLIRNSFEAITPWNFNEIVDEDETSERFIGNLTSMCTYLGGEKVLPKNSLLYSRFMLYNEINNLRVGGEKIPAALKNKMVSELFANRATSSKVTLKELKAFLKGEGVLTDADEISGIDDGVKSTLRSEILIRKIIGDKISDREMAEEIVRILTVFGDERRRSKAKLKKEFSDKLTEKEIEKLSSLKFDGWGRLSEKFLTGLRQEVNGRSMSIIEILEDTNYNLQETLSKYSFNEIIDSYNEVLTSGPRSISYDILKDSYLSPAVKRGVWRALSVVKDILKAVGRPPKKIFVETTREEREKKRTESRKDALMYLYKSCKETEWEKRLDSVEESSLRNRSLYLYYTQLGKCMYCGKNIDIG ELNTDLADRD AIYPQSKTKDDSIRNNLVLVCRGCNQAKGDRYPLPQEWVSRMHAFWTMLKDKGYISSEKYRRLTRRGELTEEEFGAFINRQLVETSQSAKAVITVLKNAFKDSDIVYVKGSNVSDFRSSYNFIKCRSVNDYHHAKDAYLNIVVGNVLDTKFTKNPSYVLKNREQYNIGRMYDRNVSRFGVDAWVAGDRGSIATVRKYMRRNNILFTRYATKSKGALFKETVHRKKEGLFERKKGLETEKYGGYSDISTSYLTLLEYDKGKKRIRSLEIVPTYFANTRPKEEDVIRFFSETRGLANVRVVMPEVRMKSLFEYRGFRFHVTGSNGKGRFWISSAIQLLLPENLYAYCKSIENNEKDSQRRSEKPLQNYGFSSEMNIELFKCLMDKAAKPPYDVKLSTLSKNLEEGFEKFKALELGPQVKVLQQILDIYSCDRKSGDLSVLGSARNAGRLDMNGVLSEADGEQVTMICQSP SGLFEKRVPMNEK dMAD200116 MKNTNEDYYLGL A IGTDSVGWAVTDKEYNILEFRRKPMWGIHLFEGGSTAQKTRVYRTSRRRLKRRAERIALLRDIFSEEIGKVDPGFFERLDESDLHLEDRVTSQKNSLFDDPEFNDKDLHKRFPTIYHLRRHLMHSNRKEDIRLIYLAAHHIIKFRGHFLYKGIGDEEIPSFEIVLNSLIDNLRDEYGMELEVSDRDLVKALLSDFSIGIREKSRELSSCLNAESENEKALVDFISGKKTNMKKLFDDEALDKMSFSLRDSGFEDQLRENEGVLGPERVHTLELSRQIFEWARLSSILKDSDSISEAKIKDYDQHREDLRMLKRAVKKYAPDKYSEVFKSKEHTGNYCSYVYVCGKGLPDKKCSTEEFQKYLKKILDDSGVRDDEEFKTLIQRLDAGILCPKQRTGENSVIPYSVHRKELIGILNNAAEHYPSLSRKGEDGFSSIDKILMLEEFRIPYYVGPLDDRSSRSWLIRNSFEAITPWNFNEIVDEDETSERFIGNLTSMCTYLGGEKVLPKNSLLYSRFMLYNEINNLRVGGEKIPAALKNKMVSELFANRATSSKVTLKELKAFLKGEGVLTDADEISGIDDGVKSTLRSEILIRKIIGDKISDREMAEEIVRILTVFGDERRRSKAKLKKEFSDKLTEKEIEKLSSLKFDGWGRLSEKFLTGLRQEVNGRSMSIIEILEDTNYNLQETLSKYSFNEIIDSYNEVLTSGPRSISYDILKDSYLSPAVKRGVWRALSVVKDILKAVGRPPKKIFVETTREEREKKRTESRKDALMYLYKSCKETEWEKRLDSVEESSLRNRSLYLYYTQLGKCMYCGKNIDIG ELNTDLADRD AIYPQSKTKDDSIRNNLVLVCRGCNQAKGDRYPLPQEWVSRMHAFWTMLKDKGYISSEKYRRLTRRGELTEEEFGAFINRQLVETSQSAKAVITVLKNAFKDSDIVYVKGSNVSDFRSSYNFIKCRSVNDYHHAKDAYLNIVVGNVLDTKFTKNPSYVLKNREQYNIGRMYDRNVSRFGVDAWVAGDRGSIATVRKYMRRNNILFTRYATKSKGALFKETVHRKKEGLFERKKGLETEKYGGYSDISTSYLTLLEYDKGKKRIRSLEIVPTYFANTRPKEEDVIRFFSETRGLANVRVVMPEVRMKSLFEYRGFRFHVTGSNGKGRFWISSAIQLLLPENLYAYCKSIENNEKDSQRRSEKPLQNYGFSSEMNIELFKCLMDKAAKPPYDVKLSTLSKNLEEGFEKFKALELGPQVKVLQQILDIYSCDRKSGDLSVLGSARNAGRLDMNGVLSEADGEQVTMICQSP SGLFEKRVPMNEK MAD200717 MENYRQKHRFVLAT A LGIGSNGWAIIDLDAHRVEDLGVQIFESGEEGA Nickase 1KKASARASQQRRLKRSAHRLNRRKKQRKEALIKFLQEIEFPDLVEILNSFKKQKNPNDILSLRVKGLDNKLSPLELFSILIYMSNNRGYKDFYDNDINDNNTDKDEKEMEKAKSTIEKLFASNSYRTVGEMIATDPTFIVDKSGSKKVIKYHNKKGYQYLIPRKLLENEMSLILHKQEEFYDCLSIDNITIILDKIFFQRNFEDGPGPKNKRDDYKNNSKGNQFYTGFNEMIGLCPFYPNEKKGTKNSLIYDEYYLINTLSQFFFTDSNGVIMSFSKSLLHDLMLYFFDHKGELTNKELSSFLLKHGLELNSKEKSNKKYRLNYMKQLTDSTIFETEMIASFREEIETSSYRSVNSLSNKIGNCIGQFITPLKRKEELTNILIDTNYPKELASKLADSIKVIKSQSVANISNKYMLEAIHAFESGKKYGDFQAEFNETRELEDHHFMKNNKLIAFQDSDLIRNPVVYRTINQSRKIINAAINKYNIVRINIEVASDVNKSFEQRDNDKKYQNDNYEKNLQLESELTDYINKENLHVNVNSKMMERYKLYLSQNKHCIYTNTPLTMMDVIYSTNVQVDHIIPQSKILDDTLNNKVLVLRDANSIKNNRLPLEAFDEMQINVDTNYTKKDYLTECLHLLKNKTNPISKKKYQYLTLKKLDDETIEGFISRNINDTRYITRYIANYLKTAFKESDKTKNIDVVTIKGAVTSRFRKRWLTTYDEYGYHPTIYSLEDKGRNLYYYHHAIDAIILANIDKRYITLANAYDTIRLIKIDRNLSKEQKQRDIDTVIKNTVKSMSKYHGFSEDYIRSLMSKNHIPAICKNLSDEVQIRIPLKFNTDYDNLGYRFTDDQYHYKKLYIAFKEAQNALKEKETLEKELIERFNNEAQILNANIILTYTGFESNNELIDIKKAKKVTDTLKPNLKNYIKAIDILTQEEYTKRCLEYYNDSEFATQLKIPYVNFKINKRFRGKIQGSENAVSLREVLKKTKLNSFEEFESYLKSEDGIKSPYYIKYTKNTLGKESYTIYEANSYYCAEIYTDSQNKPQLRGIRYVDVRKEDGKLVLLKPLPSTCKHITYLFHNEYIAIYKDSNYKRLKNNGFGAYRSINNVNVNKIIIRLFANQNLNDNDVVITSSIFIKKYSLDVFGHINGEIKCGDQSLFTIKKR MAD2007 18MENYRQKHRFVLATDLGIGSNGWAIIDLDAHRVEDLGVQIFESGEEGA Nickase 2KKASARASQQRRLKRSAHRLNRRKKQRKEALIKFLQEIEFPDLVEILNSFKKQKNPNDILSLRVKGLDNKLSPLELFSILIYMSNNRGYKDFYDNDINDNNTDKDEKEMEKAKSTIEKLFASNSYRTVGEMIATDPTFIVDKSGSKKVIKYHNKKGYQYLIPRKLLENEMSLILHKQEEFYDCLSIDNITIILDKIFFQRNFEDGPGPKNKRDDYKNNSKGNQFYTGFNEMIGLCPFYPNEKKGTKNSLIYDEYYLINTLSQFFFTDSNGVIMSFSKSLLHDLMLYFFDHKGELTNKELSSFLLKHGLELNSKEKSNKKYRLNYMKQLTDSTIFETEMIASFREEIETSSYRSVNSLSNKIGNCIGQFITPLKRKEELTNILIDTNYPKELASKLADSIKVIKSQSVANISNKYMLEAIHAFESGKKYGDFQAEFNETRELEDHHFMKNNKLIAFQDSDLIRNPVVYRTINQSRKIINAAINKYNIVRINIEVASDVNKSFEQRDNDKKYQNDNYEKNLQLESELTDYINKENLHVNVNSKMMERYKLYLSQNKHCIYTNTPLTMMDVIYSTNVQVDAIIPQSKILDDTLNNKVLVLRDANSIKNNRLPLEAFDEMQINVDTNYTKKDYLTECLHLLKNKTNPISKKKYQYLTLKKLDDETIEGFISRNINDTRYITRYIANYLKTAFKESDKTKNIDVVTIKGAVTSRFRKRWLTTYDEYGYHPTIYSLEDKGRNLYYYHHAIDAIILANIDKRYITLANAYDTIRLIKIDRNLSKEQKQRDIDTVIKNTVKSMSKYHGFSEDYIRSLMSKNHIPAICKNLSDEVQIRIPLKFNTDYDNLGYRFTDDQYHYKKLYIAFKEAQNALKEKETLEKELIERFNNEAQILNANIILTYTGFESNNELIDIKKAKKVTDTLKPNLKNYIKAIDILTQEEYTKRCLEYYNDSEFATQLKIPYVNFKINKRFRGKIQGSENAVSLREVLKKTKLNSFEEFESYLKSEDGIKSPYYIKYTKNTLGKESYTIYEANSYYCAEIYTDSQNKPQLRGIRYVDVRKEDGKLVLLKPLPSTCKHITYLFHNEYIAIYKDSNYKRLKNNGFGAYRSINNVNVNKIIIRLFANQNLNDNDVVITSSIFIKKYSLDVFGHINGEIKCGDQSLFTIKKR dMAD2007 19 MENYRQKHRFVLAT ALGIGSNGWAIIDLDAHRVEDLGVQIFESGEEGAKKASARASQQRRLKRSAHRLNRRKKQRKEALIKFLQEIEFPDLVEILNSFKKQKNPNDILSLRVKGLDNKLSPLELFSILIYMSNNRGYKDFYDNDINDNNTDKDEKEMEKAKSTIEKLFASNSYRTVGEMIATDPTFIVDKSGSKKVIKYHNKKGYQYLIPRKLLENEMSLILHKQEEFYDCLSIDNITIILDKIFFQRNFEDGPGPKNKRDDYKNNSKGNQFYTGFNEMIGLCPFYPNEKKGTKNSLIYDEYYLINTLSQFFFTDSNGVIMSFSKSLLHDLMLYFFDHKGELTNKELSSFLLKHGLELNSKEKSNKKYRLNYMKQLTDS TIFETEMIASFREEIETSSYRSVNSLSNKIGNCIGQFITPLKRKEELTNILIDTNYPKELASKLADSIKVIKSQSVANISNKYMLEAIHAFESGKKYGDFQAEFNETRELEDHHFMKNNKLIAFQDSDLIRNPVVYRTINQSRKIINAAINKYNIVRINIEVASDVNKSFEQRDNDKKYQNDNYEKNLQLESELTDYINKENLHVNVNSKMMERYKLYLSQNKHCIYTNTPLTMMDVIYSTNVQVD A IIPQSKILDDTLNNKVLVLRDANSIKNNRLPLEAFDEMQINVDTNYTKKDYLTECLHLLKNKTNPISKKKYQYLTLKKLDDETIEGFISRNINDTRYITRYIANYLKTAFKESDKTKNIDVVTIKGAVTSRFRKRWLTTYDEYGYHPTIYSLEDKGRNLYYYHHAIDAIILANIDKRYITLANAYDTIRLIKIDRNLSKEQKQRDIDTVIKNTVKSMSKYHGFSEDYIRSLMSKNHIPAICKNLSDEVQIRIPLKFNTDYDNLGYRFTDDQYHYKKLYIAFKEAQNALKEKETLEKELIERFNNEAQILNANIILTYTGFESNNELIDIKKAKKVTDTLKPNLKNYIKAIDILTQEEYTKRCLEYYNDSEFATQLKIPYVNFKINKRFRGKIQGSENAVSLREVLKKTKLNSFEEFESYLKSEDGIKSPYYIKYTKNTLGKESYTIYEANSYYCAEIYTDSQNKPQLRGIRYVDVRKEDGKLVLLKPLPSTCKHITYLFHNEYIAIYKDSNYKRLKNNGFGAYRSINNVNVNKIIIRLFANQNLNDNDVVITSSIFIKKYSLDVFGHINGEIKCGDQSLFTIKKR MAD2008 51 MKKILGL ALGTNSIGWALIEHNFDKKEGRIDDLGVRIIPMSADILGKFD Nickase 1AGQSHSQTAERTGYRGVRRLYQRDNLRRERLHRVLNILDFLPEHYAEHIDFEKRLGQFKEGKEIKLNYKSNKDSKFEFIFKASYNEMLAAFKKYQPGLFYVKANGTETKIPYDWTIYYLRKKALSQPLTKQELAWIILNFNQKRGYYQLRGEEIDDDKNKQFVQLKVKEVIDSGEAVKGKKLFNVIFENGWKYDKQVVKTEDWIGRTKEFIVTTKTLKSGEIKRTYKAVDSEKDWAAIKAKTEQDIERSNKTVGEFIYEALLQDPTQKIRGKLVKTIERKFYKAELREILRKQIELQPQLFTTKLYNACIKELYPNNEAHRNSIKNRDFLYLFLDDIIFYQRPLKSQKSNISGCHLEQRIYTKINPVSGKKEEVKQAVKAIPKSHPIFQEFRIWQWLQNLKIYDKINTDKGELADVTNQLLPSEESLLDLFDYLQTKKELDQSGFIKYFIDKKLINKSEKENYRWNYVEDKKYPFAETRAQFISRLNKVKNINNISEFLNKKTRLGEKESSPFVTRIEQLWHIIYSVSDINEYKSALEKFALKHDIDKESFVANFIKFPPFKSDYGSYSKKALSKLLPLMRRGKYWNESDISNKVKQRVSDIMERVNALNLKENYNAKELAEALKTVSDDDVKKQLIKSFVPFKDKNPLKGLNTYQATYLVYGRHSEVGDIQSWKTPEDIDTYLKNFKQHSLRNPIVEQVVTETLRVVRDIWIHYGKSQLNFFNEIHVELGREMKNPADKRKQISNRNIENENTNNRIREILKDLKNDTSIEGDIRDYSPSQQDLLKIYEEGVYQNPKVDYSKVSEDEITKIRRSNSPTPKEIQRYRLWLEQGYISPYTGKPIPLSKLFTHEYQIEHIIPQSRYFDNSLSNKIICESAVNEDKDNKTAYEYLKNKSGNVINGHKLLRIEEYEAHVNRYFKNNRQKLKNLLSEDIPEGFINRQLNDSRYISKLIKGLLSNIVRQENEQEATSKNLIPVTGAVTSKLKNDWGLNDKWNELILPRFERLNQLTQTKNFTTSNTNGNTIPTVPDDLLKGFSKKRIDHRHHALDALVVACCTRNHVQYLNALNAEKANYGLRKKLLIVNEQGDFTKIFQMPWKGFTSEAKNQLEKTVISFKQNLRVINKANNKFWSFKDENGNINLDKNGRPVKKLRKQTKGDNWAIRKAMHKETVSGKSNIETPKGKIATAVRGSLADIKNEKHLGKITDVQIREVILPNHLKNYVDEKGKVKFDLAFNDEGIEDLNKNIIALNNGKKHQPIRKVKFFEVGSKFSISENENSAKSKKYVEAAKGTNLFFAVYWDEKKQKRNYETVPLNEVIAHQKQVAHLTNNERLPIQTNRKKGDFLFTLSPNDLVYVPTDAEVANKQPIDFKNLHQNQVNRIYKMVSSSGNQCFFIKDKIATSIWNKNEFSSLNKMEKDIDGNMIKERCIKLNVDRLGNITKA MAD2008 52MKKILGLDLGTNSIGWALIEHNFDKKEGRIDDLGVRIIPMSADILGKFD Nickase 2AGQSHSQTAERTGYRGVRRLYQRDNLRRERLHRVLNILDFLPEHYAEHIDFEKRLGQFKEGKEIKLNYKSNKDSKFEFIFKASYNEMLAAFKKYQPGLFYVKANGTETKIPYDWTIYYLRKKALSQPLTKQELAWIILNFNQKRGYYQLRGEEIDDDKNKQFVQLKVKEVIDSGEAVKGKKLFNVIFENGWKYDKQVVKTEDWIGRTKEFIVTTKTLKSGEIKRTYKAVDSEKDWAAIKAKTEQDIERSNKTVGEFIYEALLQDPTQKIRGKLVKTIERKFYKAELREILRKQIELQPQLFTTKLYNACIKELYPNNEAHRNSIKNRDFLYLFLDDIIFYQRPLKSQKSNISGCHLEQRIYTKINPVSGKKEEVKQAVKAIPKSHPIFQEFRIWQWLQNLKIYDKINTDKGELADVTNQLLPSEESLLDLFDYLQTKKELDQSGFIKYFIDKKLINKSEKENYRWNYVEDKKYPFAETRAQFISRLNKVKNINNISEFLNKKTRLGEKESSPFVTRIEQLWHIIYSVSDINEYKSALEKFALKHDIDKESFVANFIKFPPFKSDYGSYSKKALSKLLPLMRRGKYWNESDISNKVKQRVSDIMERVNALNLKENYNAKELAEALKTVSDDDVKKQLIKSFVPFKDKNPLKGLNTYQATYLVYGRHSEVGDIQSWKTPEDIDTYLKNFKQHSLRNPIVEQVVTETLRVVRDIWIHYGKSQLNFFNEIHVELGREMKNPADKRKQISNRNIENENTNNRIREILKDLKNDTSIEGDIRDYSPSQQDLLKIYEEGVYQNPKVDYSKVSEDEITKIRRSNSPTPKEIQRYRLWLEQGYISPYTGKPIPLSKLFTHEYQIE A IIPQSRYFDNSLSNKIICESAVNEDKDNKTAYEYLKNKSGNVINGHKLLRIEEYEAHVNRYFKNNRQKLKNLLSEDIPEGFINRQLNDSRYISKLIKGLLSNIVRQENEQEATSKNLIPVTGAVTSKLKNDWGLNDKWNELILPRFERLNQLTQTKNFTTSNTNGNTIPTVPDDLLKGFSKKRIDHRHHALDALVVACCTRNHVQYLNALNAEKANYGLRKKLLIVNEQGDFTKIFQMPWKGFTSEAKNQLEKTVISFKQNLRVINKANNKFWSFKDENGNINLDKNGRPVKKLRKQTKGDNWAIRKAMHKETVSGKSNIETPKGKIATAVRGSLADIKNEKHLGKITDVQIREVILPNHLKNYVDEKGKVKFDLAFNDEGIEDLNKNIIALNNGKKHQPIRKVKFFEVGSKFSISENENSAKSKKYVEAAKGTNLFFAVYWDEKKQKRNYETVPLNEVIAHQKQVAHLTNNERLPIQTNRKKGDFLFTLSPNDLVYVPTDAEVANKQPIDFKNLHQNQVNRIYKMVSSSGNQCFFIKDKIATSIWNKNEFSSLNKMEKDIDGNMIKERCIKLNVDRLGNITKA dMAD2008 53 MKKILGL ALGTNSIGWALIEHNFDKKEGRIDDLGVRIIPMSADILGKFDAGQSHSQTAERTGYRGVRRLYQRDNLRRERLHRVLNILDFLPEHYAEHIDFEKRLGQFKEGKEIKLNYKSNKDSKFEFIFKASYNEMLAAFKKYQPGLFYVKANGTETKIPYDWTIYYLRKKALSQPLTKQELAWIILNFNQKRGYYQLRGEEIDDDKNKQFVQLKVKEVIDSGEAVKGKKLFNVIFENGWKYDKQVVKTEDWIGRTKEFIVTTKTLKSGEIKRTYKAVDSEKDWAAIKAKTEQDIERSNKTVGEFIYEALLQDPTQKIRGKLVKTIERKFYKAELREILRKQIELQPQLFTTKLYNACIKELYPNNEAHRNSIKNRDFLYLFLDDIIFYQRPLKSQKSNISGCHLEQRIYTKINPVSGKKEEVKQAVKAIPKSHPIFQEFRIWQWLQNLKIYDKINTDKGELADVTNQLLPSEESLLDLFDYLQTKKELDQSGFIKYFIDKKLINKSEKENYRWNYVEDKKYPFAETRAQFISRLNKVKNINNISEFLNKKTRLGEKESSPFVTRIEQLWHIIYSVSDINEYKSALEKFALKHDIDKESFVANFIKFPPFKSDYGSYSKKALSKLLPLMRRGKYWNESDISNKVKQRVSDIMERVNALNLKENYNAKELAEALKTVSDDDVKKQLIKSFVPFKDKNPLKGLNTYQATYLVYGRHSEVGDIQSWKTPEDIDTYLKNFKQHSLRNPIVEQVVTETLRVVRDIWIHYGKSQLNFFNEIHVELGREMKNPADKRKQISNRNIENENTNNRIREILKDLKNDTSIEGDIRDYSPSQQDLLKIYEEGVYQNPKVDYSKVSEDEITKIRRSNSPTPKEIQRYRLWLEQGYISPYTGKPIPLSKLFTHEYQIE A IIPQSRYFDNSLSNKIICESAVNEDKDNKTAYEYLKNKSGNVINGHKLLRIEEYEAHVNRYFKNNRQKLKNLLSEDIPEGFINRQLNDSRYISKLIKGLLSNIVRQENEQEATSKNLIPVTGAVTSKLKNDWGLNDKWNELILPRFERLNQLTQTKNFTTSNTNGNTIPTVPDDLLKGFSKKRIDHRHHALDALVVACCTRNHVQYLNALNAEKANYGLRKKLLIVNEQGDFTKIFQMPWKGFTSEAKNQLEKTVISFKQNLRVINKANNKFWSFKDENGNINLDKNGRPVKKLRKQTKGDNWAIRKAMHKETVSGKSNIETPKGKIATAVRGSLADIKNEKHLGKITDVQIREVILPNHLKNYVDEKGKVKFDLAFNDEGIEDLNKNIIALNNGKKHQPIRKVKFFEVGSKFSISENENSAKSKKYVEAAKGTNLFFAVYWDEKKQKRNYETVPLNEVIAHQKQVAHLTNNERLPIQTNRKKGDFLFTLSPNDLVYVPTDAEVANKQPIDFKNLHQNQVNRIYKMVSSSGNQCFFIKDKIATSIWNKNEFSSLNKMEKDIDGNMIKERCIKLNVDRLGNITKA MAD2009 54 MKKILGL ALGTNSIGWAVVNADAITRNDGSRYLKPNSISAAGSRIIPMS Nickase 1ADVLGNFESGITVSQTKDRTDKRMARRLHERALLRRERLLRILSLMDFLPKHFASKINRYGKFTDDSEPKLAWRKNTEGKYEFIFQDAFNEMLAEFKDKQPEIVKEGKKIPYDWTIYYLRKKALEKALSKEELSWLLLQFNQKRGYYQLRGEEEDIPQDKKIEYLAQKVVKVEATDQKKGDDIWYNVYLENGMIYRRTSKAPLDWEGKIKEFIVTTDLEKDGTPKKDKEGNIKRSFRAPQEDDWTLLKKKTEADIEKSTKTVGCYIYDSLLNNPKQKIIGKLVRTVERKFYKEELTQILKKQVELIPELRNDNLYKQCIEELYPINEAHRNTIAKTDFANLFINDILFYQRPLKSKKSQIDNCPYEEHIFIDSKTGEKKKVPVKCITKSNPLFQEFRLWQFIQNLRIYQREKEIDGKLSTDVDITSECLKSEEDYVRLFDWLNDRESIEQEELLKYLFNTKKSKNKENPYRWNYVEDKVYPCNETRATILKGLSKCGINASVLSSEMEMALWHILYSVEDKKEIETALTHFAQKQGWNGEFAIVFSKLKPFKKDYGSYSEKAIKKLLSLMRMGKYWNQDNIDKNTLDRIDKIINGEYDEKISNRVRDNAINLKDISDFRGLPVWLACYIVYDRHSEAKDCTKWNTPEEIDSYLKKFKQHSLRNPIVEQVVTETLRTVRDIWKQEGQIDEIHLELGRDLKNPADKRKKMSENILKNENTNLRIKAMLMEFMNPGMGIENVRPYSPSQQDILRIYEENALENLTKDDEEFDFISKISKQAQPTKSDIVRYKCWLEQKYRSPYTGKTISLSKLFTSAYEIEHIIPQSRYFDDSFSNKVICEAEVNKLKDRQLGHEFIEEHHGEKVQLSQGEVVEILSVDAYEKFVKENYANNRVKMKKLLMENIPDEFIERQLNDSRYISKVVKGLLSNIVREKIDDENYEPEAVSKNLISCNGAVTDRLKKDWGMNDVWNSIILPRFIRMNQITGKDCFTTTNAEGHLIPQMPLELQKGFNKKRIDHRHHAMDAIVIACTTRDHVNLLNNEAAHSKFNATRYQLQRKLRCFEKAMIDGKEREVAKEFLKPWDSFTMDSKNILENIIVSFKQNQRVINKTTNTFQHFDENGKKTFVKQGKGNSWAIRKPMHKDTVFGEINLRKVKSVSLSDAIKVPERILNKRIKEKITELKNNKVDAKNIKKYIEEYHIGGYGIDTSKIDVFYFTKETKERFFATRKSLDSSFNQAKIEDSIADSGIQKILLAHLKSKNGDAEQAFSPDGIDEMNKNIVELNNGKFHQPILKVRVYEKADKFAVGQKGNKKVKFVEAAKGTNLFFAVFEKDGKRSYLTIPLNVMIDCQKKYGNQWKQNIESYLKEKDLVEKDVQLLFILSPNDLVYLPTENELKKGITNPDKDQIYKFVSCTSNEAHFIPSFVANPIVQTTELGSNNKAQRAWNNKMIKEICI PIEVDRLGNIK MAD2009 55MKKILGLDLGTNSIGWAVVNADAITRNDGSRYLKPNSISAAGSRIIPMS Nickase 2ADVLGNFESGITVSQTKDRTDKRMARRLHERALLRRERLLRILSLMDFLPKHFASKINRYGKFTDDSEPKLAWRKNTEGKYEFIFQDAFNEMLAEFKDKQPEIVKEGKKIPYDWTIYYLRKKALEKALSKEELSWLLLQFNQKRGYYQLRGEEEDIPQDKKIEYLAQKVVKVEATDQKKGDDIWYNVYLENGMIYRRTSKAPLDWEGKIKEFIVTTDLEKDGTPKKDKEGNIKRSFRAPQEDDWTLLKKKTEADIEKSTKTVGCYIYDSLLNNPKQKIIGKLVRTVERKFYKEELTQILKKQVELIPELRNDNLYKQCIEELYPINEAHRNTIAKTDFANLFINDILFYQRPLKSKKSQIDNCPYEEHIFIDSKTGEKKKVPVKCITKSNPLFQEFREWQFIQNLRIYQREKEIDGKESTDVDITSECLKSEEDYVREFDWLNDRESIEQEELLKYLFNTKKSKNKENPYRWNYVEDKVYPCNETRATILKGLSKCGINASVLSSEMEMALWHILYSVEDKKEIETALTHFAQKQGWNGEFAIVFSKLKPFKKDYGSYSEKAIKKLLSLMRMGKYWNQDNIDKNTLDRIDKIINGEYDEKISNRVRDNAINLKDISDFRGLPVWLACYIVYDRHSEAKDCTKWNTPEEIDSYLKKFKQHSERNPIVEQVVTETERTVRDIWKQEGQIDEIHLELGRDLKNPADKRKKMSENILKNENTNLRIKAMLMEFMNPGMGIENVRPYSPSQQDILRIYEENALENLTKDDEEFDFISKISKQAQPTKSDIVRYKCWLEQKYRSPYTGKTISLSKLFTSAYEIE A IIPQSRYFDDSFSNKVICEAEVNKLKDRQLGHEFIEEHHGEKVQLSQGEVVEILSVDAYEKFVKENYANNRVKMKKELMENIPDEFIERQLNDSRYISKVVKGLESNIVREKIDDENYEPEAVSKNLISCNGAVTDRLKKDWGMNDVWNSIILPRFIRMNQITGKDCFTTTNAEGHLIPQMPLELQKGFNKKRIDHRHHAMDAIVIACTTRDHVNLENNEAAHSKFNATRYQLQRKERCFEKAMIDGKEREVAKEFLKPWDSFTMDSKNILENIIVSFKQNQRVINKTTNTFQHFDENGKKTFVKQGKGNSWAIRKPMHKDTVFGEINLRKVKSVSLSDAIKVPERILNKRIKEKITELKNNKVDAKNIKKYIEEYHIGGYGIDTSKIDVFYFTKETKERFFATRKSLDSSFNQAKIEDSIADSGIQKILLAHLKSKNGDAEQAFSPDGIDEMNKNIVELNNGKFHQPILKVRVYEKADKFAVGQKGNKKVKFVEAAKGTNEFFAVFEKDGKRSYLTIPENVMIDCQKKYGNQWKQNIESYLKEKDLVEKDVQLLFILSPNDLVYLPTENELKKGITNPDKDQIYKFVSCTSNEAHFIPSFVANPIVQTTELGSNNKAQRAWNNKMIKEICI PIEVDRLGNIK dMAD2009 56MKKILGL A LGTNSIGWAVVNADAITRNDGSRYLKPNSISAAGSRIIPMSADVLGNFESGITVSQTKDRTDKRMARRLHERALLRRERLERILSEMDFLPKHFASKINRYGKFTDDSEPKLAWRKNTEGKYEFIFQDAFNEMLAEFKDKQPEIVKEGKKIPYDWTIYYLRKKALEKALSKEELSWELLQFNQKRGYYQLRGEEEDIPQDKKIEYLAQKVVKVEATDQKKGDDIWYNVYLENGMIYRRTSKAPLDWEGKIKEFIVTTDLEKDGTPKKDKEGNIKRSFRAPQEDDWTELKKKTEADIEKSTKTVGCYIYDSLENNPKQKIIGKEVRTVERKFYKEELTQILKKQVELIPELRNDNLYKQCIEELYPINEAHRNTIAKTDFANLFINDILFYQRPLKSKKSQIDNCPYEEHIFIDSKTGEKKKVPVKCITKSNPLFQEFREWQFIQNLRIYQREKEIDGKESTDVDITSECLKSEEDYVREFDWLNDRESIEQEELLKYLFNTKKSKNKENPYRWNYVEDKVYPCNETRATILKGLSKCGINASVLSSEMEMALWHILYSVEDKKEIETALTHFAQKQGWNGEFAIVFSKLKPFKKDYGSYSEKAIKKLLSLMRMGKYWNQDNIDKNTLDRIDKIINGEYDEKISNRVRDNAINLKDISDFRGLPVWLACYIVYDRHSEAKDCTKWNTPEEIDSYLKKFKQHSERNPIVEQVVTETERTVRDIWKQEGQIDEIHLELGRDLKNPADKRKKMSENILKNENTNLRIKAMLMEFMNPGMGIENVRPYSPSQQDILRIYEENALENLTKDDEEFDFISKISKQAQPTKSDIVRYKCWLEQKYRSPYTGKTISLSKLFTSAYEIE A IIPQSRYFDDSFSNKVICEAEVNKLKDRQLGHEFIEEHHGEKVQLSQGEVVEILSVDAYEKFVKENYANNRVKMKKELMENIPDEFIERQLNDSRYISKVVKGLESNIVREKIDDENYEPEAVSKNLISCNGAVTDRLKKDWGMNDVWNSIILPRFIRMNQITGKDCFTTTNAEGHLIPQMPLELQKGFNKKRIDHRHHAMDAIVIACTTRDHVNLENNEAAHSKFNATRYQLQRKERCFEKAMIDGKEREVAKEFLKPWDSFTMDSKNILENIIVSFKQNQRVINKTTNTFQHFDENGKKTFVKQGKGNSWAIRKPMHKDTVFGEINLRKVKSVSLSDAIKVPERILNKRIKEKITELKNNKVDAKNIKKYIEEYHIGGYGIDTSKIDVFYFTKETKERFFATRKSLDSSFNQAKIEDSIADSGIQKILLAHLKSKNGDAEQAFSPDGIDEMNKNIVELNNGKFHQPILKVRVYEKADKFAVGQKGNKKVKFVEAAKGTNEFFAVFEKDGKRSYLTIPENVMIDCQKKYGNQWKQNIESYLKEKDLVEKDVQLLFILSPNDLVYLPTENELKKGITNPDKDQIYKFVSCTSNEAHFIPSFVANPIVQTTELGSNNKAQRAWNNKMIKEICI PIEVDRLGNIK MAD2011 57MKKDYVIGL A IGTNSVGWAVMTEDYQLVKKKMPIYGNTEKKKIKKN Nickase 1FWGVRLFEEGHTAEDRRLKRTARRRISRRRNRLRYLQAFFEEAMTALDENFFARLQESFLVPEDKKWHRHPIFAKLEDEVAYHETYPTIYHLRKKLADSSEQADLRLIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFMIIYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKSEKVLQQFPQEKANGLFGQFLKLMVGNKADFKKVFGLEEEAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYDAVELSTILADSDKKSHAKLSSSMIVRFTEHQEDLKNFKRFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQLKFYQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVIPHQIHLAELQAIIHRQAAYYPFLKENQKKIEQLVTFRIPYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLDQSATAFIERMTNFDTYLPSEKVLPKHSLLYEKFMVFNELTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKVKKKDIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKCGLTRAELDHPDNAEKLEDIIKILTIFEDRQRIRTQLSTFKGQFSEEVLKKLERKHYTGWGRLSKKLINGIYDKESGKTILDYLIKDDGVSKHYNRNFMQLINDSQLSFKNAIQKAQSSEHEETLSETVNELAGSPAIKKGIYQSLKIVDELVAIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKAMAEIGSNLLKEQPTTNEQLRDTRLFLYYMQNGKDMYTGDELSLHRLSHYDIDHIIPQSFMKDDSLDNLVLVGSTENRGKSDDVPSKEVVKKMKAYWEKLYAAGLISQRKFQRLTKGEQGGLTLEDKAHFIQRQLVETRQITKNVAGILDQRYNAKSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYHHGQDAYLNCVVATTLLKVYPNLAPEFVYGEYPKFQAFKENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYLKTIKKELNYHQMNIVKKVEVQKGGFSKESIKPKGPSNKLIPVKNGLDPQKYGGFDSPVVAYTVLFTHEKGKKPLIKQEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYEFPEGRRRLLASAKEAQKGNQMVLPEHLLTLLYHAKQCLLPNQSESLAYVEQHQPEFQEILERVVDFAEVHTLAKSKVQQIVKLFEANQTADVKEIAASFIQLMQFNAMGAPSTFKFFQKDIERARYTSIKEIFDATIIYQSTTGLYETRRKVVD MAD2011 58MKKDYVIGLDIGTNSVGWAVMTEDYQLVKKKMPIYGNTEKKKIKKN Nickase 2FWGVRLFEEGHTAEDRRLKRTARRRISRRRNRLRYLQAFFEEAMTALDENFFARLQESFLVPEDKKWHRHPIFAKLEDEVAYHETYPTIYHLRKKLADSSEQADLRLIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFMIIYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKSEKVLQQFPQEKANGLFGQFLKLMVGNKADFKKVFGLEEEAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYDAVELSTILADSDKKSHAKLSSSMIVRFTEHQEDLKNFKRFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQLKFYQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVIPHQIHLAELQAIIHRQAAYYPFLKENQKKIEQLVTFRIPYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLDQSATAFIERMTNFDTYLPSEKVLPKHSLLYEKFMVFNELTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKVKKKDIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKCGLTRAELDHPDNAEKLEDIIKILTIFEDRQRIRTQLSTFKGQFSEEVLKKLERKHYTGWGRLSKKLINGIYDKESGKTILDYLIKDDGVSKHYNRNFMQLINDSQLSFKNAIQKAQSSEHEETLSETVNELAGSPAIKKGIYQSLKIVDELVAIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKAMAEIGSNLLKEQPTTNEQLRDTRLFLY YMQNGKDMYTGDELSLHRLS AYDIDHIIPQSFMKDDSLDNLVLVGSTENRGKSDDVPSKEVVKKMKAYWEKLYAAGLISQRKFQRLTKGEQGGLTLEDKAHFIQRQLVETRQITKNVAGILDQRYNAKSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYHHGQDAYLNCVVATTLLKVYPNLAPEFVYGEYPKFQAFKENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYLKTIKKELNYHQMNIVKKVEVQKGGFSKESIKPKGPSNKLIPVKNGLDPQKYGGFDSPVVAYTVLFTHEKGKKPLIKQEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYEFPEGRRRLLASAKEAQKGNQMVLPEHLLTLLYHAKQCLLPNQSESLAYVEQHQPEFQEILERVVDFAEVHTLAKSKVQQIVKLFEANQTADVKEIAASFIQLMQFNAMGAPSTFKFFQKDIERARYTSIKEIFDATIIYQSTTGLYETRRKVVD dMAD2011 59 MKKDYVIGL AIGTNSVGWAVMTEDYQLVKKKMPIYGNTEKKKIKKNFWGVRLFEEGHTAEDRRLKRTARRRISRRRNRLRYLQAFFEEAMTALDENFFARLQESFLVPEDKKWHRHPIFAKLEDEVAYHETYPTIYHLRKKLADSSEQADLRLIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFMIIYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKSEKVLQQFPQEKANGLFGQFLKLMVGNKADFKKVFGLEEEAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYDAVELSTILADSDKKSHAKLSSSMIVRFTEHQEDLKNFKRFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQLKFYQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVIPHQIHLAELQAIIHRQAAYYPFLKENQKKIEQLVTFRIPYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLDQSATAFIERMTNFDTYLPSEKVLPKHSLLYEKFMVFNELTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKVKKKDIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKCGLTRAELDHPDNAEKLEDIIKILTIFEDRQRIRTQLSTFKGQFSEEVLKKLERKHYTGWGRLSKKLINGIYDKESGKTILDYLIKDDGVSKHYNRNFMQLINDSQLSFKNAIQKAQSSEHEETLSETVNELAGSPAIKKGIYQSLKIVDELVAIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKAMAEIGSNLLKEQPTTNEQLRDTRLFLY YMQNGKDMYTGDELSLHRLS AYDIDHIIPQSFMKDDSLDNLVLVGSTENRGKSDDVPSKEVVKKMKAYWEKLYAAGLISQRKFQRLTKGEQGGLTLEDKAHFIQRQLVETRQITKNVAGILDQRYNAKSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYHHGQDAYLNCVVATTLLKVYPNLAPEFVYGEYPKFQAFKENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYLKTIKKELNYHQMNIVKKVEVQKGGFSKESIKPKGPSNKLIPVKNGLDPQKYGGFDSPVVAYTVLFTHEKGKKPLIKQEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYEFPEGRRRLLASAKEAQKGNQMVLPEHLLTLLYHAKQCLLPNQSESLAYVEQHQPEFQEILERVVDFAEVHTLAKSKVQQIVKLFEANQTADVKEIAASFIQLMQFNAMGAPSTFKFFQKDIERARYTSIKEIFDATIIYQSTTGLYETRRKVVD

Example 6 Screening for Active MAD2001 and MAD2007 Spacers inHEK293T-GFP Cells by Measuring GFP Loss of Function

In order to test if MAD2007 and MAD2001 are active in mammalian cells, alibrary of spacers targeting the GFP locus in HEK293t-GFP cells wasdesigned. For MAD2001, 23 spacers targeting NNRC PAMs were designed intwo gRNA scaffolds (g1 and g3). For MAD2007, 43 spacers targeting NNNSRPAMs were designed in 3 gRNA scaffolds (g1, g2 and g3). The gRNAs werecloned into a pComplete plasmid (CMV-MAD200x-U6-gRNA) and transformedinto E. coli (NEB5 alpha strain). The resulting colonies were pickedinto 96-well midwell plates and grown overnight. The E. coli culture wasused as a PCR template to amplify a 7 kb fragment that contains CMVpromoter driven MAD2007/MAD2001 and U6 driven gRNA. The 150 ng ofunpurified PCR reactions were used to transfect 20,000 HEK293T-GFP cellsin 96 wells with 1 μL of PolyFect transfection reagent. The cells werethen incubated at 37° C. for 96 hours followed by flow cytometry. Usingflow cytometry, GFP− cells and GFP+ cells in each well were measured.The percentage of GFP− cells in each well was plotted (see FIG. 7).Relative to the negative control, a number of gRNA spacers for MAD2001and MAD2007 were found to be functional. Overall, 6 out of 23 designedspacers for MAD2001 were functional. Whereas 34/43 spacers designed forMAD2007 were functional. The spacer and PAM sequences identified usingthis GFP screen are shown in Table 8.

TABLE 8 SEQ Spacer (“hit”) SEQ ID PAM ID name Spacer Sequence No.Sequence No. MAD2007-g1 GTAGCGGCTGAAGCACTGCA 60 cgccgtaggt 61 MAD2007-g2GGCGTGCAGTGCTTCAGCCGCTA 62 ccccgaccac 63 MAD2007-g3GCCGCTACCCCGACCACATGAAG 64 cagcacgact 65 MAD2007-g4GGCCAGGGCACGGGCAGTTTGCC 66 ggtggtgcag 67 MAD2007-g5GTGGTGCCCATCCTGGTCGAGCT 68 ggacggcgac 69 MAD2007-g8GGCATCGCCCTCGCCCTCGCCGG 70 acacgctgaa 71 MAD2007-g9GCCCTCGCCGGACACGCTGAACT 72 tgtggccgtt 73 MAD2007-g10GGTGGTCACCAAAGTGGGCCAGG 74 gcacgggcag 75 MAD2007-g11GTGTCCGGCGAGGGCGAGGGCGA 76 tgccacctac 77 MAD2007-g12GTGAGCAAGGGCGAGGAGCTGTT 78 caccggggtg 79 MAD2007-g13GTGGTCACCAAAGTGGGCCAGGG 80 cacgggcagt 81 MAD2007-g14GCGAGGGCGAGGGCGATGCCACC 82 tacggcaagc 83 MAD2007-g15GTGCTGCTTCATGTGGTCGGGGT 84 agcggctgaa 85 MAD2007-g16GCTGAACTTGTGGCCGTTTACGT 86 cgccgtccag 87 MAD2007-g17GCTGAAGCACTGCACGCCGTAGG 88 tcagggtggt 89 MAD2007-g18GTCACCAAAGTGGGCCAGGGCA 90 cgggcagttt 91 MAD2007-g19GTAAACGGCCACAAGTTCAGCGT 92 gtccggcgag 93 MAD2007-g20GTGGCATCGCCCTCGCCCTCGCC 94 ggacacgctg 95 MAD2007-g21GGGGTAGCGGCTGAAGCACTGCA 96 cgccgtaggt 97 MAD2007-g22GGCACGGGCAGTTTGCCGGTGGT 98 gcagatgaac 99 MAD2007-g23GGGGTGGTGCCCATCCTGGTCGA 100 gctggacggc 101 MAD2007-g24GCGTGCAGTGCTTCAGCCGCTAC 102 cccgaccaca 103 MAD2007-g25GTCACCAAAGTGGGCCAGGGCAC 104 gggcagtttg 105 MAD2007-g26GACGGCGACGTAAACGGCCACAA 106 gttcagcgtg 107 MAD2007-g27GCGTGCAGTGCTTCAGCCGCTA 108 ccccgaccac 109 MAD2007-g28GCCACAAGTTCAGCGTGTCCGGC 110 gagggcgagg 111 MAD2007-g29GGGTGGTGCCCATCCTGGTCGAG 112 ctggacggcg 113 MAD2007-g30GTCAGGGTGGTCACCAAAGTGGG 114 ccagggcacg 115 MAD2007-g31GTGCCCATCCTGGTCGAGCTGGA 116 cggcgacgta 117 MAD2007-g32GCCGTAGGTGGCATCGCCCTCGC 118 cctcgccgga 119 MAD2007-g33GGCGACGTAAACGGCCACAAGTT 120 cagcgtgtcc 121 MAD2007-g34GAGCTGGACGGCGACGTAAACGG 122 ccacaagttc 123 MAD2007-g35GTGGTGCCCATCCTGGTCGA 124 gctggacggc 125 MAD2007-g36GGCCACAAGTTCAGCGTGTCCGG 126 cgagggcgag 127 MAD2001-g1GGTGGTCACCAAAGTGGGCCAGG 128 gcacgggcag 129 MAD2001-g2GTCCGGCGAGGGCGAGGGCGATG 130 ccacctacgg 131 MAD2001-g4GTGCCCATCCTGGTCGAGCTGGA 132 cggcgacgta 133 MAD2001-g5GTGTCCGGCGAGGGCGAGGGCGA 134 tgccacctac 135 MAD2001-g6GGGTCAGCTTGCCGTAGGTGGCA 136 tcgccctcgc 137 MAD2001-g7GCCGCTACCCCGACCACATGAAG 138 cagcacgact 139

Additionally, the PAM regions of the 34 MAD2007 spacers that werefunctional in HEF293T cells were used to generate a sequence logo, whichis shown in FIG. 8. The results show that the PAM of MAD2007 in HEK293Tcells is NNNSR.

Next, some of the MAD2007 spacer hits identified in the screen weretested using plasmid transfections in HEK293T-GFP cells for validation.The results are shown in FIG. 9. Relative to the no gRNA control, thehits were found to be active. However, there were varying levels ofactivities depending on the spacer sequence. The activity of MAD2007 iscomparable to MAD7 and but lower than SpCas9.

Example 7 Codon Optimization of MAD2007

All the data discussed thus far was generated using E.coli codonoptimized MAD2007. MAD2007 was optimized for human cells and two codonoptimized versions (hsMAD2007v1 [SEQ ID NO. 140] and hsMAD2007v2 [SEQ IDNo. 141]) were designed. The designs were then cloned into pCompletevector which contains CMV driven MAD2007-T2A-dsRed and U6 driven gRNA(g11 with scaffold 1 from above). For each design, 4 separate clones(c1, c2, c3 and c4) were picked and tested in HEK293T-GFP cells for GFPloss of function. The results are shown in FIG. 9. Based on dsRedexpression, hsMAD2007v2 showed higher expression relative to EcMAD2007.Furthermore, based on the percentage of GFP−, hsMAD2007v2 showed higheractivity relative to EcMAD2007.

Example 8 MAD2007 Homolologs

A bioinformatic search for MAD2007-like protein sequences was performedin the public databases and three sequences from Sharpea azabuensis wereidentified that are ≥98% identical to MAD2007. These sequences are shownin Table 9.

TABLE 9 MAD2007 homolog Protein Sequence MAD2007MENYRQKHRFVLATDLGIGSNGWAIIDLDAHRVEDLGVQIFESG [SEQ ID No. 7]EEGAKKASARASQQRREKRSAHRENRRKKQRKEALIKFLQEIEFPDLVEILNSFKKQKNPNDILSLRVKGLDNKLSPLELFSILIYMSNNRGYKDFYDNDINDNNTDKDEKEMEKAKSTIEKLFASNSYRTVGEMIATDPTFIVDKSGSKKVIKYHNKKGYQYLIPRKLLENEMSLILHKQEEFYDCLSIDNITIILDKIFFQRNFEDGPGPKNKRDDYKNNSKGNQFYTGFNEMIGLCPFYPNEKKGTKNSLIYDEYYLINTLSQFFFTDSNGVIMSFSKSLLHDLMLYFFDHKGELTNKELSSFLLKHGLELNSKEKSNKKYRLNYMKQLTDSTIFETEMIASFREEIETSSYRSVNSLSNKIGNCIGQFITPLKRKEELTNILIDTNYPKELASKLADSIKVIKSQSVANISNKYMLEAIHAFESGKKYGDFQAEFNETRELEDHHFMKNNKLIAFQDSDLIRNPVVYRTINQSRKIINAAINKYNIVRINIEVASDVNKSFEQRDNDKKYQNDNYEKNLQLESELTDYINKENLHVNVNSKMMERYKLYLSQNKHCIYTNTPLTMMDVIYSTNVQVDHIIPQSKILDDTLNNKVLVLRDANSIKNNRLPLEAFDEMQINVDTNYTKKDYLTECLHLLKNKTNPISKKKYQYLTLKKLDDETIEGFISRNINDTRYITRYIANYLKTAFKESDKTKNIDVVTIKGAVTSRFRKRWLTTYDEYGYHPTIYSLEDKGRNLYYYHHAIDAIILANIDKRYITLANAYDTIRLIKIDRNLSKEQKQRDIDTVIKNTVKSMSKYHGFSEDYIRSLMSKNHIPAICKNLSDEVQIRIPLKFNTDYDNLGYRFTDDQYHYKKLYIAFKEAQNALKEKETLEKELIERFNNEAQILNANIILTYTGFESNNELIDIKKAKKVTDTLKPNLKNYIKAIDILTQEEYTKRCLEYYNDSEFATQLKIPYVNFKINKRFRGKIQGSENAVSLREVLKKTKLNSFEEFESYLKSEDGIKSPYYIKYTKNTLGKESYTIYEANS YYCAEIYTDSQNKPQLRGIRYVDVRKEDGKLVLLKPLPS TCKHITYLFHNEYIAIYKDSNYKRLKNNGFGAYRSINNVNVNKIIIRLFANQNLNDNDVVITSSIFIKKYSLDVFGHINGEIKCGDQSLFTIKKR MAD2007-likeMENYRQKHRFVLATDLGIGSNGWAIIDLDAHRVEDLGVQIFESG protein 1EEGAKKASARASQQRRLKRSAHRLNRRKKQRKEALIKFLQEIEF (WP_033162146)PDLVEILNSFKKQKNPNDILSLRVKGLDNKLSPLELFSILIYMSNN [SEQ ID No. 142]RGYKDFYDNDINDNNTDNDEKEMQKAKSTIEKLFASNSYRTVGEMIATDPTFIVDKSGSKKVIKYHNKKGYQYLIPRKLLENEMSLILHKQEEFYDCLSIDNITIILDKIFFQRNFEDGPGPKNKRDDYKNNSKGNQFYTGFNEMIGLCPFYPNEKKGTKNSLIYDEYYLINTLSQFFFTDSNGVIMSFSKSLLHDLMLYFFDHKGEITNKELSSFLLKHGLELNSKEKSNKKYKLNYMKQLTDSTIFETEMIASFREEIETSSYRSVNSLSNKIGNCIGQFITPSKRKEELTNILIDTNYPKELASKLADSIKVIKSQSVANISNKYMLEAIHAFESGKKYGDFQAEFNETRELEDHHFMKNNKLIAIQDSDLIRNPVVYRTINQSRKIINAAINKYNIVRINIEVASDVNKSFEQRDNDKKYQNDNYEKNLQLESELTDYINKENLHVNVNSKMMERYKLYLSQNKHCIYTNTPLTMMDVIYGTNVQVDHIIPQSKILDDTLNNKVLVLRDANSIKNNRLPLEAFDEMQINVDTNYTKKDYLTECLHLLKNKTNPISKKKYQYLTLKKLDDETIEGFISRNINDTRYITRYIANYLKTAFKESDKTKNIDVVTIKGAVTSRFRKRWLTTYDEYGYHPTIYSLEDKGRNLYYYHHAIDAIILANIDKRYITLANAYDTIRLIKIDRNLSKEQKQRDIDTVIKNTVKSMSKYHGFSEDYIRSLMSKNHIPAICKNLSDEVQIRIPLKFNTDYDNLGYRFTDDQYHYKKLYIAFKEAQNALKEKEILEKELTERFNNEAQILNANIILTYTGFESNNELIDIKKAKKVIDTLKPDLKNYIKAIDILTQEEYTKRCLEYYNDSEFAEQLKIPYVNFKINKRFRGKIQGSENAVSLREVLKKTKLNSFEEFESYLKSEDGIKSPYYIKYTKNTLGKESYTIYEANSYYCAEIYTDSQNKPQLRGIRYVDVRKEDGKLVLLKPLPSTCKHITYLFHNEYIAIYKDSNYKRLKNNGFGAYRSIKNVNVNKIIIRLFANQNLNDNDVVITSSIFIKKYSLDVFGHINGEIKCGDQSLFTIKKR MAD2007-likeMENYRQKHRFVLATDLGIGSNGWAIIDLDAHRVEDLGVQIFESG protein 2EEGAKKASARASQQRRLKRSAHRLNRRKKQRKESLIKFLQEIEF (WP_074732643)PDLNNILDSFKKQKNPNDILSLRVKGLDNKLSPLELFSVLIYMSN [SEQ ID No. 143]NRGYKDFYDNDINEDKKDSDEKEMQKAKSTIEKLFASNSYRTVGEMIATDPTFIVDKSGSKKVIKYHNKKGYQYLIPRKLLENEMSLILHKQEEFYDCLSIDNITIILDKIFFQRNFEDGPGPKNKRDDYKNNSKGNQFYTGFNEMIGLCPFYPNEKKGTKNSLIYDEYYLINTLSQFFFTDSNGVIMSFSKSLLHDLMLYFFDHKGELTNKELSSFLLKHGLELNSKEKSNKKYRLNYMKQLTDSTIFETEMIASFREEIETSSYRSVNSLSNKIGNCIGQFITPLKRKEELTNILIDTNYPKELASKLADSIKVIKSQSVANISNKYMLEAIHAFESGKKYGDFQAEFNETRELEDHHFMKNNKLIAFQDSDLIRNPVVYRTINQSRKIINAAINKYNIVRINIEVASDVNKSFEQRDNDKKYQNDNYEKNLQLESELTDYINKENLHVNVNSKMMERYKLYLSQNKHCIYTNTPLTMMDVIYGTNVQVDHIIPQSKILDDTLNNKVLVLRDANSIKNNRLPLEAFDEMQINVDTNYTKKDYLTECLHLLKNKTNPISKKKYQYLTLKKLDDETIEGFISRNINDTRYITRYIANYLKTAFKESDKTKNIDVVTIKGAVTSRFRKRWLTTYDEYGYHPTIYSLEDKGRNLYYYHHAIDAIILANIDKRYITLANAYDTIRLIKIDRNLSKEQKQRDIDTVIKNTVKSMSKYHGFSEDYIRSLMSKNHIPAICKNLSDEVQIRIPLKFNTDYDNLGYRFTDDQYHYKKLYIAFKEAQNALKEKEILEKELTERFNNEAQILNANIILTYTGFESNNELIDIKKAKKVIDTLKPDLKNYIKAIDILTQEEYTKRCLEYYNDSEFAEQLKIPYVNFKINKRFRGKIQGSENAVSLREVLKKTKLNSFEEFESYLKSEDGIKSPYYIKYTKNTLGKESYTIYEANSYYCAEIYTDSQNKPQLRGIRYVDVRKEDGKLVLLKPLPSTCKHITYLFHNEYIAIYKDSNYKRLKNNGFGAYRSIKNVNVNKIIIRLFANQNLNDNDVVITSSIFIKKYSLDVFGHINGEIKCGDQSLFTIKKR MAD2007-likeMENYRQKHRFVLATDLGIGSNGWAIIDLDAHRVEDLGVQIFESG protein 3EEGAKKASARASQQRRLKRSAHRLNRRKKQRKEALIKFLQEIEF (WP_164121414)PDLVEILNSFKKQKNPNDILSLRVKGLDNKLSPLELFSVLIYMSN [SEQ ID No. 144]NRGYKDFYDNDINEDKKDSDEKEMQKAKSTIEKLFASNSYRTVGEMIATDPTFIVDKSGSKKVIKYHNKKGYQYLIPRKLLENEMSLILHKQEEFYDCLSIDNVTIILDKIFFQRNFEDGPGPKNKRDDYKNNSKGNQFYTGFNEMIGLCPFYPNEKKGTKNSLIYDEYYLINTLSQFFPTDSNGVIMSFSKSLLHDLMLYFFDHKGELTYKELSSFLLKHGLELNSKEKSNKKYRLNYMKQLTDSTIFETEMIASFREEIETSSYRSVNSLSNKIGNCIGQFITPLKRKEELTNILIDTNYPKELASKLADSIKVIKSQSVANISNKYMLEAIHAFESGKKYGDFQAEFNETRELEDHHFMKNNKLIAFQDSDLIRNPVVYRTINQSRKIINAAINKYNIVRINIEVASDVNKSFEQRDNDKKYQNDNYEKNLQLESELTDYINKENLHVNVNSKMMERYKLYLSQNKHCIYTNTPLTMMDVIYSTNVQVDHIIPQSKILDDTLNNKVLVLRDANSIKNNRLPLEAFDEMQINVDTNYTKKDYLTECLHLLKNKTNPISKKKYQYLTLKKLDDETIEGFISRNINDTRYITRYIANYLKTAFKESDKTKNIDVVTIKGAVTSRFRKRWLTTYDEYGYHPTIYSLEDKGRNLYYYHHAIDAIILANIDKRYITLANAYDTIRLIKIDRNLSKEQKQRDIDTVIKNTVKSMSKYHGFSEDYIRSLMSKNHIPAICKNLSDEVQIRIPLKFNTDYDNLGYRFTDDQYHYKKLYIAFKEAQNALKEKETLEKELIERFNNEAQILNANIILTYTGFESNNELIDIKKAKKVTDTLKPNLKNYIKAIDILTQEEYTKRCLEYYNDSEFATQLKIPYVNFKINKRFRGKIQGSENAVSLREVLKKTKLNSFEEFESYLKSEDGIKSPYYIKYTKNTLGKESYTIYEANSYYCAEIYTDSQNKPQLRGIRYVDVRKEDGKLVLLKPLPSTCKHITYLFHNEYIAIYKDSNYKRLKNNGFGAYRSINNVNVNKIIIRLFANQNLNDNDVVITSSIFIKKYSLDVFGHINGEIKCGDQSLFTIKKR

While this invention is satisfied by embodiments in many differentforms, as described in detail in connection with preferred embodimentsof the invention, it is understood that the present disclosure is to beconsidered as exemplary of the principles of the invention and is notintended to limit the invention to the specific embodiments illustratedand described herein. Numerous variations may be made by persons skilledin the art without departure from the spirit of the invention. The scopeof the invention will be measured by the appended claims and theirequivalents. The abstract and the title are not to be construed aslimiting the scope of the present invention, as their purpose is toenable the appropriate authorities, as well as the general public, toquickly determine the general nature of the invention. In the claimsthat follow, unless the term “means” is used, none of the features orelements recited therein should be construed as means-plus-functionlimitations pursuant to 35 U.S.C. § 112, ¶6.

We claim:
 1. A nuclease system configured to perform nucleic acid-guidednuclease editing, wherein the nuclease system is selected from anuclease system comprising SEQ ID NO. 11, SEQ ID NO. 43 and SEQ ID NO.44; a nuclease system comprising SEQ ID NO. 11, SEQ ID NO. 45 and SEQ IDNO. 46; and a nuclease system comprising SEQ ID NO. 11, SEQ ID NO. 47and SEQ ID NO.
 48. 2. The nuclease system of claim 1 comprising SEQ IDNO. 11, SEQ ID NO. 43 and SEQ ID NO.
 44. 3. The nuclease system of claim1 comprising SEQ ID NO. 11, SEQ ID NO. 45 and SEQ ID NO.
 46. 4. Thenuclease system of claim 1 comprising SEQ ID NO. 11, SEQ ID NO. 47 andSEQ ID NO.
 48. 5. A nickase having an amino acid sequence selected fromSEQ ID NOs. 57 and
 58. 6. A dead nuclease having an amino acid sequenceof SEQ ID NO. 59.